Samenvatting
Class-based language modeling (LM) is a long-studied and effective approach to overcome data sparsity in the context of n-gram model training. In statistical machine translation (SMT), different forms of class-based LMs have been shown to improve baseline translation quality when used in combination with standard word-level LMs but no published work has systematically compared different kinds of classes, model forms and LM combination methods in a unified SMT setting. This paper aims to fill these gaps by focusing on the challenging problem of translating into Russian, a language with rich inflectional morphology and complex agreement phenomena. We conduct our evaluation in a large-data scenario and report statistically significant BLEU improvements of up to 0.6 points when using a refined variant of the class-based model originally proposed by Brown et al. (1992).
Originele taal-2 | English |
---|---|
Titel | Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics |
Subtitel | Technical Papers |
Uitgeverij | Association for Computational Linguistics, ACL Anthology |
Pagina's | 1918-1927 |
Aantal pagina's | 10 |
ISBN van geprinte versie | 9781941643266 |
Status | Published - 2014 |
Extern gepubliceerd | Ja |
Evenement | 25th International Conference on Computational Linguistics - Dublin, Ireland Duur: 23-aug.-2014 → 29-aug.-2014 Congresnummer: 25 |
Conference
Conference | 25th International Conference on Computational Linguistics |
---|---|
Verkorte titel | COLING |
Land/Regio | Ireland |
Stad | Dublin |
Periode | 23/08/2014 → 29/08/2014 |