Abstract
Cross-lingual word embedding models learn
a shared vector space for two or more lan-
guages so that words with similar meaning
are represented by similar vectors regardless
of their language. Although the existing mod-
els achieve high performance on pairs of mor-
phologically simple languages, they perform
very poorly on morphologically rich languages
such as Turkish and Finnish. In this pa-
per, we propose a morpheme-based model in
order to increase the performance of cross-
lingual word embeddings on morphologically
rich languages. Our model includes a sim-
ple extension which enables us to exploit mor-
phemes for cross-lingual mapping. We ap-
plied our model for the Turkish-Finnish lan-
guage pair on the bilingual word translation
task. Results show that our model outper-
forms the baseline models by 2% in the nearest
neighbour ranking.
a shared vector space for two or more lan-
guages so that words with similar meaning
are represented by similar vectors regardless
of their language. Although the existing mod-
els achieve high performance on pairs of mor-
phologically simple languages, they perform
very poorly on morphologically rich languages
such as Turkish and Finnish. In this pa-
per, we propose a morpheme-based model in
order to increase the performance of cross-
lingual word embeddings on morphologically
rich languages. Our model includes a sim-
ple extension which enables us to exploit mor-
phemes for cross-lingual mapping. We ap-
plied our model for the Turkish-Finnish lan-
guage pair on the bilingual word translation
task. Results show that our model outper-
forms the baseline models by 2% in the nearest
neighbour ranking.
Original language | English |
---|---|
Pages | 1222-1228 |
Number of pages | 7 |
DOIs | |
Publication status | Published - 2-Sept-2019 |
Event | Recent Advances in Natural Language Processing 2019 - Varna, Bulgaria Duration: 2-Sept-2019 → 4-Sept-2019 http://lml.bas.bg/ranlp2019/start.php |
Conference
Conference | Recent Advances in Natural Language Processing 2019 |
---|---|
Abbreviated title | RANLP 2019 |
Country/Territory | Bulgaria |
City | Varna |
Period | 02/09/2019 → 04/09/2019 |
Internet address |