Literary-adapted machine translation in a well-resourced language pair: Explorations with More Data and Wider Contexts

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

1 Citation (Scopus)
35 Downloads (Pure)

Abstract

Following recent work on literary-adapted machine translation (MT) systems, this paper investigates whether it is worthwhile building such a system for a reasonably well-resourced language pair, English-to-Dutch, for which generic MT systems (e.g. DeepL) are known to be competitive. Specifically, a system is presented that uses considerably more in-domain training data (novels) than in previous work, as well as an exploration of using longer instances than isolated sentence pairs (i.e. document-level MT). A sizable test set of 31 English-language novels and their published Dutch human translations is evaluated. The evaluation is multidimensional, including automatic MT evaluation metrics, error- and survey-based human evaluation, as well as quantitative automatic analyses, including the novel use of literariness prediction of translations. The results show that, overall, a literary-adapted system that combines sentence- and document-level information performs slightly better than DeepL (4% higher COMET score), with the edge being wider for genre fiction, while the gains over DeepL are smaller or negative for literary fiction. Code, data (public domain subset), and trained systems are available at https://github.com/antot/lit-mt-en-nl" xmlns:xlink="https://www.w3.org/1999/xlink">https://github.com/antot/lit-mt-en-nl.
Original languageEnglish
Title of host publicationComputer-Assisted Literary Translation
EditorsAndrew Rothwell, Andy Way, Roy Youdale
PublisherRoutledge
Chapter1
Pages27-52
Number of pages26
ISBN (Electronic)9781003357391
ISBN (Print)9781032413006
DOIs
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Literary-adapted machine translation in a well-resourced language pair: Explorations with More Data and Wider Contexts'. Together they form a unique fingerprint.

Cite this