TY - GEN
T1 - Enhancing Standard and Dialectal Frisian ASR
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
AU - Amooie, Reihaneh
AU - De Vries, Wietse
AU - Hao, Yun
AU - Dijkstra, Jelske
AU - Coler, Matt
AU - Wieling, Martijn
N1 - Publisher Copyright:
© 2025 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Automatic Speech Recognition (ASR) performance for low-resource languages is still far behind that of higherresource languages such as English, due to a lack of sufficient labeled data. State-of-the-art methods deploy self-supervised transfer learning where a model pre-trained on large amounts of data is fine-tuned using little labeled data in a target low-resource language. In this paper, we present and examine a method for fine-tuning an SSL-based model in order to improve the performance for Frisian and its regional dialects (Clay Frisian, Wood Frisian, and South Frisian). We show that Frisian ASR performance can be improved by using multilingual (Frisian, Dutch, English and German) fine-tuning data and an auxiliary language identification task. In addition, our findings show that performance on dialectal speech suffers substantially, and, importantly, that this effect is moderated by the elicitation approach used to collect the dialectal data. Our findings also particularly suggest that relying solely on standard language data for ASR evaluation may underestimate real-world performance, particularly in languages with substantial dialectal variation.
AB - Automatic Speech Recognition (ASR) performance for low-resource languages is still far behind that of higherresource languages such as English, due to a lack of sufficient labeled data. State-of-the-art methods deploy self-supervised transfer learning where a model pre-trained on large amounts of data is fine-tuned using little labeled data in a target low-resource language. In this paper, we present and examine a method for fine-tuning an SSL-based model in order to improve the performance for Frisian and its regional dialects (Clay Frisian, Wood Frisian, and South Frisian). We show that Frisian ASR performance can be improved by using multilingual (Frisian, Dutch, English and German) fine-tuning data and an auxiliary language identification task. In addition, our findings show that performance on dialectal speech suffers substantially, and, importantly, that this effect is moderated by the elicitation approach used to collect the dialectal data. Our findings also particularly suggest that relying solely on standard language data for ASR evaluation may underestimate real-world performance, particularly in languages with substantial dialectal variation.
KW - automatic speech recognition, low-resource languages, self-supervised learning, XLS-R, dialectal speech recognition
UR - https://www.scopus.com/pages/publications/105009695075
U2 - 10.1109/ICASSP49660.2025.10889692
DO - 10.1109/ICASSP49660.2025.10889692
M3 - Conference contribution
AN - SCOPUS:105009695075
SN - 979-8-3503-6875-8
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PB - IEEE
Y2 - 6 April 2025 through 11 April 2025
ER -