DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages

Dataset

Description

DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keystrokes, editing times and pauses were recorded, enabling an in-depth, cross-lingual evaluation of NMT quality and post-editing effectiveness. Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity.
Date made availableMay-2022
PublisherUniversity of Groningen
Date of data productionJan-2022 - May-2022

Keywords on Datasets

  • machine translation
  • translation
  • post-editing
  • productivity
  • linguistic typology

Cite this