The Potential of Speech Features to Discriminate between Original and Machine-Translated Texts

Yongjian Chen, Mireia Farrús, Antonio Toral

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Discriminating between original texts and machine translations involves identifying whether a text was originally authored in the target language or generated through machine translation. To our knowledge, all methods to date depend exclusively on text-based features. In this study, we move beyond this unimodal approach by incorporating speech features. Machine-translated texts display linguistic deviations from original texts, such as those in lexicon and syntax, which can also manifest in speech characteristics. We evaluate the effectiveness of using text features, speech features, and their bimodal fusion to train classifiers capable of discerning original from machine-translated texts. Additionally, we explore various classification algorithms and fusion techniques. Our results show that speech features alone surpass chance accuracy, while combining text and speech features enhances performance beyond text-only methods. Furthermore, although no single classification or fusion method proves consistently superior, advanced fusion techniques outperform simple feature concatenation.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Electronic)9798350368741
DOIs
Publication statusPublished - 2025
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 6-Apr-202511-Apr-2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period06/04/202511/04/2025

Keywords

  • acoustic fusion
  • machine translationese
  • original texts
  • text classification
  • textual

Fingerprint

Dive into the research topics of 'The Potential of Speech Features to Discriminate between Original and Machine-Translated Texts'. Together they form a unique fingerprint.

Cite this