TY - JOUR
T1 - Automatic Lemmatization of Ancient Greek inscriptions: a presentation of AGILe
AU - Peels-Matthey, Saskia
AU - de Graaf, Evelien
AU - Stopponi, Silvia
AU - Bos, Jasper
AU - Nissim, Malvina
PY - 2024/7
Y1 - 2024/7
N2 - In this paper, we present the first automatic lemmatizer for Ancient Greek Inscriptions (AGILe). Lemmatization of ancient texts, the process of tagging each word with the base form equal to the dictionary entry, benefits researchers, since searches on a lemmatized corpus can retrieve all occurrences of a lemma in one query. Whereas the corpus of literary texts (e.g. the Thesaurus Linguae Graecae) has been lemmatized, the vast majority of Ancient Greek inscriptions has not. Lemmatization is useful especially for inscriptions, since these are texts with a great amount of dialectal and spelling variation, but to lemmatize this vast corpus by hand would be an enormous task. We evaluated the performance of five existing automatic lemmatizers, developed for literary Greek, on epigraphic texts. Since their performance was disappointing (61.5% accuracy at best), we developed a new lemmatizer dedicated to Greek inscriptions. The performance of our lemmatizer is 85.6%. We provide a detailed error analysis as well as concrete suggestions for future improvement, as first steps towards the integration of AGILe in an online corpus of inscriptions.
AB - In this paper, we present the first automatic lemmatizer for Ancient Greek Inscriptions (AGILe). Lemmatization of ancient texts, the process of tagging each word with the base form equal to the dictionary entry, benefits researchers, since searches on a lemmatized corpus can retrieve all occurrences of a lemma in one query. Whereas the corpus of literary texts (e.g. the Thesaurus Linguae Graecae) has been lemmatized, the vast majority of Ancient Greek inscriptions has not. Lemmatization is useful especially for inscriptions, since these are texts with a great amount of dialectal and spelling variation, but to lemmatize this vast corpus by hand would be an enormous task. We evaluated the performance of five existing automatic lemmatizers, developed for literary Greek, on epigraphic texts. Since their performance was disappointing (61.5% accuracy at best), we developed a new lemmatizer dedicated to Greek inscriptions. The performance of our lemmatizer is 85.6%. We provide a detailed error analysis as well as concrete suggestions for future improvement, as first steps towards the integration of AGILe in an online corpus of inscriptions.
KW - Ancient Greek
KW - digital humanities
KW - lemmatization
KW - inscriptions
U2 - 10.19272/202413701002
DO - 10.19272/202413701002
M3 - Article
SN - 2612-3517
VL - 7
SP - 29
EP - 50
JO - Journal of Epigraphic Studies
JF - Journal of Epigraphic Studies
ER -