Abstract
We apply two measures of lexical semantic change detection to Word2Vec embeddings trained on a diachronic corpus of literary Ancient Greek texts. The two measures are the Vector Coherence, based on the comparison between vectors of the same word in different time periods, and the J, based on the Jaccard coefficient, which quantifies the overlap between the k nearest neighbours in each possible combination of time slices. Through the analysis of the most stable and unstable words detected with both measures, we show that the two measures are effective at finding non-changed words, while Vector Coherence seems to be more reliable than J at detecting changed words. Still, low J could indicate a real semantic change when the same word also has a low Vector Coherence. For both measures, the detection of changed words is hampered by the presence of lemmatization errors in the training corpus.
Original language | English |
---|---|
Title of host publication | The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024) |
Subtitle of host publication | Proceedings of the Workshop |
Editors | Colin Swaelens, Maxime Deforche, Ilse De Vos, Els Lefever |
Place of Publication | Gent, Belgium |
Publisher | Ghent University |
Pages | 47-57 |
Number of pages | 11 |
ISBN (Print) | 9789078848127 |
Publication status | Published - 27-Jun-2024 |
Event | The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024) - Mercator A104 (Abdisstraat 1, 9000 Ghent, Belgium), Gent, Belgium Duration: 27-Jun-2024 → 27-Jun-2024 https://www.dbbe2024.ugent.be/workshop/ |
Workshop
Workshop | The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024) |
---|---|
Abbreviated title | DAAL 2024 |
Country/Territory | Belgium |
City | Gent |
Period | 27/06/2024 → 27/06/2024 |
Internet address |
Keywords
- semantic change detection
- Ancient Greek
- language modelling
- ancient language
- word embeddings
- word2vec