EXCEPTIUS Corpus v1.0, containing the following data: - raw documents for 21 countries at national level - pre-processed data with spacy-udpipe v1.0 - automatically annotated documents for the identification of exceptional measures at sentence level Country list (ISO 3166-1 alpha-2): AT, BE, HR, CY, CZ, DK, FR, DE, HU, IE, IT, LV, LT, NL, NO, PL, SI, SE, CH, UK Folder structure: each country has a dedicated folder. Inside each folder you will find the following subfolders: - raw_text: the raw text data (.txt format) - processed: the output of the spacy-udpipe v1.0 - each line is a sentence, containing the following info: tokens, lemma, POS, UD dependency relations - model: the predictions of the trained model (XML pre@36 as reported in Table 4 of the paper). Each line is a sentence, separate by 9 tab - each for a exceptional measure class. 1: signals presence of a class. The Italy and Norway folder misses the predictions of the models.
Datum van beschikbaarheid | 29-nov.-2021 |
---|
Uitgever | DataverseNL |
---|