AGILe: The First Lemmatizer for Ancient Greek Inscriptions

Evelien de Graaf, Silvia Stopponi, Jasper Bos, Saskia Peels-Matthey, Malvina Nissim

OnderzoeksoutputAcademicpeer review

23 Downloads (Pure)

Samenvatting

To facilitate corpus searches by classicists as well as to reduce data sparsity when training models, we focus on the automatic lemmatization of ancient Greek inscriptions, which have not received as much attention in this sense as literary text data has. We show that existing lemmatizers for ancient Greek, trained on literary data, are not performant on epigraphic data, due to major language differences between the two types of texts. We thus train the first inscription-specific lemmatizer achieving above 80% accuracy, and make both the models and the lemmatized data available to the community. We also provide a detailed error analysis highlighting peculiarities of inscriptions which again highlights the importance of a lemmatizer dedicated to inscriptions.
Originele taal-2English
TitelProceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)
RedacteurenNicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Plaats van productieMarseille, France
UitgeverijEuropean Language Resources Association (ELRA)
Pagina's5334-5344
Aantal pagina's11
ISBN van geprinte versie9791095546726
StatusPublished - jun-2022
EvenementThe 13th Conference on Language Resources and Evaluation - Palais du Pharo, Marseille, France
Duur: 20-jun-202225-jun-2022
https://lrec2022.lrec-conf.org/en/

Conference

ConferenceThe 13th Conference on Language Resources and Evaluation
Verkorte titelLREC 2022
Land/RegioFrance
StadMarseille
Periode20/06/202225/06/2022
Internet adres

Citeer dit