Class-Based Language Modeling for Translating into Morphologically Rich Languages

Arianna Bisazza, Christof Monz

OnderzoeksoutputAcademicpeer review

8 Citaten (Scopus)
9 Downloads (Pure)

Samenvatting

Class-based language modeling (LM) is a long-studied and effective approach to overcome data sparsity in the context of n-gram model training. In statistical machine translation (SMT), different forms of class-based LMs have been shown to improve baseline translation quality when used in combination with standard word-level LMs but no published work has systematically compared different kinds of classes, model forms and LM combination methods in a unified SMT setting. This paper aims to fill these gaps by focusing on the challenging problem of translating into Russian, a language with rich inflectional morphology and complex agreement phenomena. We conduct our evaluation in a large-data scenario and report statistically significant BLEU improvements of up to 0.6 points when using a refined variant of the class-based model originally proposed by Brown et al. (1992).
Originele taal-2English
TitelProceedings of COLING 2014, the 25th International Conference on Computational Linguistics
SubtitelTechnical Papers
UitgeverijAssociation for Computational Linguistics, ACL Anthology
Pagina's1918-1927
Aantal pagina's10
ISBN van geprinte versie9781941643266
StatusPublished - 2014
Extern gepubliceerdJa
Evenement25th International Conference on Computational Linguistics - Dublin, Ireland
Duur: 23-aug.-201429-aug.-2014
Congresnummer: 25

Conference

Conference25th International Conference on Computational Linguistics
Verkorte titelCOLING
Land/RegioIreland
StadDublin
Periode23/08/201429/08/2014

Citeer dit