Cross-Lingual Word Embeddings for Morphologically Rich Languages

    Research output: Contribution to conferencePaperAcademic

    17 Downloads (Pure)

    Abstract

    Cross-lingual word embedding models learn
    a shared vector space for two or more lan-
    guages so that words with similar meaning
    are represented by similar vectors regardless
    of their language. Although the existing mod-
    els achieve high performance on pairs of mor-
    phologically simple languages, they perform
    very poorly on morphologically rich languages
    such as Turkish and Finnish. In this pa-
    per, we propose a morpheme-based model in
    order to increase the performance of cross-
    lingual word embeddings on morphologically
    rich languages. Our model includes a sim-
    ple extension which enables us to exploit mor-
    phemes for cross-lingual mapping. We ap-
    plied our model for the Turkish-Finnish lan-
    guage pair on the bilingual word translation
    task. Results show that our model outper-
    forms the baseline models by 2% in the nearest
    neighbour ranking.
    Original languageEnglish
    Pages1222-1228
    Number of pages7
    DOIs
    Publication statusPublished - 2-Sept-2019
    EventRecent Advances in Natural Language Processing 2019 - Varna, Bulgaria
    Duration: 2-Sept-20194-Sept-2019
    http://lml.bas.bg/ranlp2019/start.php

    Conference

    ConferenceRecent Advances in Natural Language Processing 2019
    Abbreviated titleRANLP 2019
    Country/TerritoryBulgaria
    CityVarna
    Period02/09/201904/09/2019
    Internet address

    Cite this