TY - GEN
T1 - The Potential of Speech Features to Discriminate between Original and Machine-Translated Texts
AU - Chen, Yongjian
AU - Farrús, Mireia
AU - Toral, Antonio
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Discriminating between original texts and machine translations involves identifying whether a text was originally authored in the target language or generated through machine translation. To our knowledge, all methods to date depend exclusively on text-based features. In this study, we move beyond this unimodal approach by incorporating speech features. Machine-translated texts display linguistic deviations from original texts, such as those in lexicon and syntax, which can also manifest in speech characteristics. We evaluate the effectiveness of using text features, speech features, and their bimodal fusion to train classifiers capable of discerning original from machine-translated texts. Additionally, we explore various classification algorithms and fusion techniques. Our results show that speech features alone surpass chance accuracy, while combining text and speech features enhances performance beyond text-only methods. Furthermore, although no single classification or fusion method proves consistently superior, advanced fusion techniques outperform simple feature concatenation.
AB - Discriminating between original texts and machine translations involves identifying whether a text was originally authored in the target language or generated through machine translation. To our knowledge, all methods to date depend exclusively on text-based features. In this study, we move beyond this unimodal approach by incorporating speech features. Machine-translated texts display linguistic deviations from original texts, such as those in lexicon and syntax, which can also manifest in speech characteristics. We evaluate the effectiveness of using text features, speech features, and their bimodal fusion to train classifiers capable of discerning original from machine-translated texts. Additionally, we explore various classification algorithms and fusion techniques. Our results show that speech features alone surpass chance accuracy, while combining text and speech features enhances performance beyond text-only methods. Furthermore, although no single classification or fusion method proves consistently superior, advanced fusion techniques outperform simple feature concatenation.
KW - acoustic fusion
KW - machine translationese
KW - original texts
KW - text classification
KW - textual
UR - http://www.scopus.com/inward/record.url?scp=105003887009&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49660.2025.10887578
DO - 10.1109/ICASSP49660.2025.10887578
M3 - Conference contribution
AN - SCOPUS:105003887009
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
A2 - Rao, Bhaskar D
A2 - Trancoso, Isabel
A2 - Sharma, Gaurav
A2 - Mehta, Neelesh B.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Y2 - 6 April 2025 through 11 April 2025
ER -