Neural Machine Translation for English–Kazakh with Morphological Segmentation and Synthetic Data

Antonio Toral Ruiz, Lukas Edman, Jennifer Spenader, Galiya Yeshmagambetova

OnderzoeksoutputAcademicpeer review

62 Downloads (Pure)


This paper presents the systems submitted by the University of Groningen to the English-Kazakh language pair (both translation directions) for the WMT 2019 news translation task. We explore the potential benefits of (i) morphological segmentation (both unsupervised and rule-based), given the agglutinative nature of Kazakh, (ii) data from two additional languages (Turkish and Russian), given the scarcity of English-Kazakh data and (iii) synthetic data, both for the source and for the target language. Our best sub- missions ranked second for Kazakh-English and third for English-Kazakh in terms of the BLEU automatic evaluation metric.
Originele taal-2English
TitelProceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Plaats van productieForence, Italy
UitgeverijAssociation for Computational Linguistics (ACL)
Aantal pagina's7
StatusPublished - 1-aug-2019

Citeer dit