TY - JOUR
T1 - A Set of Recommendations for Assessing Human--Machine Parity in Language Translation
AU - Läubli, Samuel
AU - Castilho, Sheila
AU - Neubig, Graham
AU - Sennrich, Rico
AU - Shen, Qinlan
AU - Toral, Antonio
N1 - Publisher Copyright:
© 2020 AI Access Foundation. All rights reserved.
PY - 2020/3
Y1 - 2020/3
N2 - The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human–machine parity was owed to weaknesses in the evaluation design—which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human–machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
AB - The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human–machine parity was owed to weaknesses in the evaluation design—which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human–machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
UR - https://jair.org/index.php/jair/about#jair-license
U2 - 10.1613/jair.1.11371
DO - 10.1613/jair.1.11371
M3 - Article
SN - 1076-9757
VL - 67
SP - 653
EP - 672
JO - Journal of artificial intelligence research
JF - Journal of artificial intelligence research
ER -