Cross-Lingual Word Embeddings for Morphologically Rich Languages



    Cross-lingual word embedding models learn
    a shared vector space for two or more lan-
    guages so that words with similar meaning
    are represented by similar vectors regardless
    of their language. Although the existing mod-
    els achieve high performance on pairs of mor-
    phologically simple languages, they perform
    very poorly on morphologically rich languages
    such as Turkish and Finnish. In this pa-
    per, we propose a morpheme-based model in
    order to increase the performance of cross-
    lingual word embeddings on morphologically
    rich languages. Our model includes a sim-
    ple extension which enables us to exploit mor-
    phemes for cross-lingual mapping. We ap-
    plied our model for the Turkish-Finnish lan-
    guage pair on the bilingual word translation
    task. Results show that our model outper-
    forms the baseline models by 2% in the nearest
    neighbour ranking.
    Originele taal-2English
    Aantal pagina's7
    StatusPublished - 2-sep-2019
    EvenementRecent Advances in Natural Language Processing 2019 - Varna, Bulgaria
    Duur: 2-sep-20194-sep-2019


    ConferenceRecent Advances in Natural Language Processing 2019
    Verkorte titelRANLP 2019
    Internet adres

    Citeer dit