Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation

Filip Klubička, Antonio Toral Ruiz, M. Víctor Sánchez-Cartagena

    Research output: Contribution to journalArticleAcademicpeer-review

    441 Downloads (Pure)

    Abstract

    We compare three approaches to statistical machine translation (pure phrase-based, fac-
    tored phrase-based and neural) by performing a fine-grained manual evaluation via error an-
    notation of the systems’ outputs. The error types in our annotation are compliant with the
    multidimensional quality metrics (MQM), and the annotation is performed by two annotators.
    Inter-annotator agreement is high for such a task, and results show that the best performing
    system (neural) reduces the errors produced by the worst system (phrase-based) by 54%.
    Original languageEnglish
    Pages (from-to)121-132
    Number of pages12
    JournalThe Prague Bulletin of Mathematical Linguistics
    Volume108
    Issue number1
    DOIs
    Publication statusPublished - 2017

    Fingerprint

    Dive into the research topics of 'Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation'. Together they form a unique fingerprint.

    Cite this