Automatic Discrimination of Human and Neural Machine Translation: A Study with Multiple Pre-Trained Models and Longer Context

Tobias van der Werff, Rik van Noord, Antonio Toral

OnderzoeksoutputAcademicpeer review

5 Citaten (Scopus)
66 Downloads (Pure)

Samenvatting

We address the task of automatically distinguishing between human-translated (HT) and machine translated (MT) texts. Following recent work, we fine-tune pre-trained language models (LMs) to perform this task. Our work differs in that we use state-of-the-art pre-trained LMs, as well as the test sets of the WMT news shared tasks as training data, to ensure the sentences were not seen during training of the MT system itself. Moreover, we analyse performance for a number of different experimental setups, such as adding translationese data, going beyond the sentence-level and normalizing punctuation. We show that (i) choosing a state-of-the-art LM can make quite a difference: our best baseline system (DeBERTa) outperforms both BERT and RoBERTa by over 3% accuracy, (ii) adding translationese data is only beneficial if there is not much data available, (iii) considerable improvements can be obtained by classifying at the document-level and (iv) normalizing punctuation and thus avoiding (some) shortcuts has no impact on model performance.
Originele taal-2English
TitelProceedings of the 23rd Annual Conference of the European Association for Machine Translation
RedacteurenHelena Moniz, Lieve Macken, Andrew Rufener, Loïc Barrault, Marta R. Costa-jussà, Christophe Declercq, Maarit Koponen, Ellie Kemp, Spyridon Pilos, Mikel L. Forcada, Carolina Scarton, Joachim van den Bogaert, Joke Daems, Arda Tezcan, Bram Vanroy, Margot Fonteyne
Plaats van productieGhent, Belgium
UitgeverijEuropean Association for Machine Translation
Pagina's161-170
Aantal pagina's10
StatusPublished - 1-jun.-2022

Vingerafdruk

Duik in de onderzoeksthema's van 'Automatic Discrimination of Human and Neural Machine Translation: A Study with Multiple Pre-Trained Models and Longer Context'. Samen vormen ze een unieke vingerafdruk.

Citeer dit