Cross-Lingual Word Embeddings for Morphologically Rich Languages

    OnderzoeksoutputAcademicpeer review

    111 Downloads (Pure)

    Samenvatting

    Cross-lingual word embedding models learn a shared vector space for two or more languages so that words with similar meaning are represented by similar vectors regardless of their language. Although the existing models achieve high performance on pairs of morphologically simple languages, they perform very poorly on morphologically rich languages such as Turkish and Finnish. In this paper, we propose a morpheme-based model in order to increase the performance of crosslingual word embeddings on morphologically rich languages. Our model includes a simple extension which enables us to exploit morphemes for cross-lingual mapping. We applied our model for the Turkish-Finnish language pair on the bilingual word translation task. Results show that our model outperforms the baseline models by 2% in the nearest neighbour ranking. © 2019 Association for Computational Linguistics (ACL).
    Originele taal-2English
    Titel12th International Conference on Recent Advances in Natural Language Processing
    UitgeverijAssociation for Computational Linguistics (ACL)
    Pagina's1222-1228
    Aantal pagina's7
    ISBN van geprinte versie978-954452055-7
    DOI's
    StatusPublished - 2-sep.-2019
    EvenementRecent Advances in Natural Language Processing 2019 - Varna, Bulgaria
    Duur: 2-sep.-20194-sep.-2019
    http://lml.bas.bg/ranlp2019/start.php

    Conference

    ConferenceRecent Advances in Natural Language Processing 2019
    Verkorte titelRANLP 2019
    Land/RegioBulgaria
    StadVarna
    Periode02/09/201904/09/2019
    Internet adres

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Cross-Lingual Word Embeddings for Morphologically Rich Languages'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit