Class-Based Language Modeling for Translating into Morphologically Rich Languages

Arianna Bisazza, Christof Monz

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

6 Citations (Scopus)
5 Downloads (Pure)

Abstract

Class-based language modeling (LM) is a long-studied and effective approach to overcome data sparsity in the context of n-gram model training. In statistical machine translation (SMT), different forms of class-based LMs have been shown to improve baseline translation quality when used in combination with standard word-level LMs but no published work has systematically compared different kinds of classes, model forms and LM combination methods in a unified SMT setting. This paper aims to fill these gaps by focusing on the challenging problem of translating into Russian, a language with rich inflectional morphology and complex agreement phenomena. We conduct our evaluation in a large-data scenario and report statistically significant BLEU improvements of up to 0.6 points when using a refined variant of the class-based model originally proposed by Brown et al. (1992).
Original languageEnglish
Title of host publicationProceedings of COLING 2014, the 25th International Conference on Computational Linguistics
Subtitle of host publicationTechnical Papers
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages1918-1927
Number of pages10
ISBN (Print)9781941643266
Publication statusPublished - 2014
Externally publishedYes
Event25th International Conference on Computational Linguistics - Dublin, Ireland
Duration: 23-Aug-201429-Aug-2014
Conference number: 25

Conference

Conference25th International Conference on Computational Linguistics
Abbreviated titleCOLING
CountryIreland
CityDublin
Period23/08/201429/08/2014

Cite this