Samenvatting
We apply two measures of lexical semantic change detection to Word2Vec embeddings trained on a diachronic corpus of literary Ancient Greek texts. The two measures are the Vector Coherence, based on the comparison between vectors of the same word in different time periods, and the J, based on the Jaccard coefficient, which quantifies the overlap between the k nearest neighbours in each possible combination of time slices. Through the analysis of the most stable and unstable words detected with both measures, we show that the two measures are effective at finding non-changed words, while Vector Coherence seems to be more reliable than J at detecting changed words. Still, low J could indicate a real semantic change when the same word also has a low Vector Coherence. For both measures, the detection of changed words is hampered by the presence of lemmatization errors in the training corpus.
Originele taal-2 | English |
---|---|
Titel | The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024) |
Subtitel | Proceedings of the Workshop |
Redacteuren | Colin Swaelens, Maxime Deforche, Ilse De Vos, Els Lefever |
Plaats van productie | Gent, Belgium |
Uitgeverij | Ghent University |
Pagina's | 47-57 |
Aantal pagina's | 11 |
ISBN van geprinte versie | 9789078848127 |
Status | Published - 27-jun.-2024 |
Evenement | The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024) - Mercator A104 (Abdisstraat 1, 9000 Ghent, Belgium), Gent, Belgium Duur: 27-jun.-2024 → 27-jun.-2024 https://www.dbbe2024.ugent.be/workshop/ |
Workshop
Workshop | The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024) |
---|---|
Verkorte titel | DAAL 2024 |
Land/Regio | Belgium |
Stad | Gent |
Periode | 27/06/2024 → 27/06/2024 |
Internet adres |