Abstract
Despite significant advancements in automatic speech recognition technology (ASR) the performance of such systems on dysarthric speech is still inadequate for widespread use. One key reason is the lack of sufficiently rich and diverse dysarthric speech datasets to train machine learning models that could handle all types and varieties of such speech. Motivated by the data scarcity problem, as well as by successful applications of self-supervised learning (SSL) in ASR for low-resource languages, this paper investigates and evaluates the effectiveness of three different data-centric SSL training strategies in improving Dutch dysarthric speech recognition. The first strategy involves fine-tuning with both dysarthric and healthy speech data, the second with disease-specific data and the third with speaker-specific data. The first and third strategies are proven effective, while the second one, though ineffective, provides valuable insights for further research.
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech 2024 |
Publisher | ISCA |
Pages | 1295-1299 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 1-Sept-2024 |
Event | Interspeech 2024 - Kos, Greece Duration: 1-Sept-2024 → 5-Sept-2024 |
Conference
Conference | Interspeech 2024 |
---|---|
Country/Territory | Greece |
City | Kos |
Period | 01/09/2024 → 05/09/2024 |
Keywords
- sarcasm
- speech acoustics