Neural Machine Translation for English–Kazakh with Morphological Segmentation and Synthetic Data

Antonio Toral Ruiz, Lukas Edman, Jennifer Spenader, Galiya Yeshmagambetova

OnderzoeksoutputAcademicpeer review

8 Citaten (Scopus)
181 Downloads (Pure)

Samenvatting

This paper presents the systems submitted by the University of Groningen to the English-Kazakh language pair (both translation directions) for the WMT 2019 news translation task. We explore the potential benefits of (i) morphological segmentation (both unsupervised and rule-based), given the agglutinative nature of Kazakh, (ii) data from two additional languages (Turkish and Russian), given the scarcity of English-Kazakh data and (iii) synthetic data, both for the source and for the target language. Our best sub- missions ranked second for Kazakh-English and third for English-Kazakh in terms of the BLEU automatic evaluation metric.
Originele taal-2English
TitelProceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Plaats van productieForence, Italy
UitgeverijAssociation for Computational Linguistics (ACL)
Pagina's386-392
Aantal pagina's7
Volume2
StatusPublished - 1-aug.-2019

Vingerafdruk

Duik in de onderzoeksthema's van 'Neural Machine Translation for English–Kazakh with Morphological Segmentation and Synthetic Data'. Samen vormen ze een unieke vingerafdruk.

Citeer dit