Measuring the effect of conversational aspects on machine translation quality

Marlies Van Der Wees, Arianna Bisazza, Christof Monz

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

9 Citations (Scopus)
39 Downloads (Pure)

Abstract

Research in statistical machine translation (SMT) is largely driven by formal translation tasks, while translating informal text is much more challenging. In this paper we focus on SMT for the informal genre of dialogues, which has rarely been addressed to date. Concretely, we investigate the effect of dialogue acts, speakers, gender, and text register on SMT quality when translating fictional dialogues. We first create and release a corpus of multilingual movie dialogues annotated with these four dialogue-specific aspects. When measuring translation performance for each of these variables, we find that BLEU fluctuations between their categories are often significantly larger than randomly expected. Following this finding, we hypothesize and show that SMT of fictional dialogues benefits from adaptation towards dialogue acts and registers. Finally, we find that male speakers are harder to translate and use more vulgar language than female speakers, and that vulgarity is often not preserved during translation.
Original languageEnglish
Title of host publicationProceedings of COLING 2016, the 26th International Conference on Computational Linguistics
Subtitle of host publicationTechnical Papers
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages2571-2581
Number of pages11
ISBN (Print)9784879747020
Publication statusPublished - 2016
Externally publishedYes
EventThe 26th International Conference on Computational Linguistics - Osaka, Japan
Duration: 13-Dec-201616-Dec-2016
http://coling2016.anlp.jp/

Conference

ConferenceThe 26th International Conference on Computational Linguistics
Abbreviated titleCOLING 2016
Country/TerritoryJapan
CityOsaka
Period13/12/201616/12/2016
Internet address

Fingerprint

Dive into the research topics of 'Measuring the effect of conversational aspects on machine translation quality'. Together they form a unique fingerprint.

Cite this