Abstract
Following recent work on literary-adapted machine translation (MT) systems, this paper investigates whether it is worthwhile building such a system for a reasonably well-resourced language pair, English-to-Dutch, for which generic MT systems (e.g. DeepL) are known to be competitive. Specifically, a system is presented that uses considerably more in-domain training data (novels) than in previous work, as well as an exploration of using longer instances than isolated sentence pairs (i.e. document-level MT). A sizable test set of 31 English-language novels and their published Dutch human translations is evaluated. The evaluation is multidimensional, including automatic MT evaluation metrics, error- and survey-based human evaluation, as well as quantitative automatic analyses, including the novel use of literariness prediction of translations. The results show that, overall, a literary-adapted system that combines sentence- and document-level information performs slightly better than DeepL (4% higher COMET score), with the edge being wider for genre fiction, while the gains over DeepL are smaller or negative for literary fiction. Code, data (public domain subset), and trained systems are available at https://github.com/antot/lit-mt-en-nl" xmlns:xlink="https://www.w3.org/1999/xlink">https://github.com/antot/lit-mt-en-nl.
Original language | English |
---|---|
Title of host publication | Computer-Assisted Literary Translation |
Editors | Andrew Rothwell, Andy Way, Roy Youdale |
Publisher | Routledge |
Chapter | 1 |
Pages | 27-52 |
Number of pages | 26 |
ISBN (Electronic) | 9781003357391 |
ISBN (Print) | 9781032413006 |
DOIs | |
Publication status | Published - 2024 |