Abstract
Acoustic-to-articulatory inversion (AAI) is the process of inferring vocal tract movements from acoustic speech signals. Despite its diverse potential applications, AAI research in languages other than English is scarce due to the challenges of collecting articulatory data. In recent years, self-supervised learning (SSL) based representations have shown great potential for addressing low-resource tasks. We utilize wav2vec 2.0 representations and English articulatory data for training AAI systems and investigates their effectiveness for a different language: Dutch. Results show that using mms-1b features can reduce the cross-lingual performance drop to less than 30%. We found that increasing model size, selecting intermediate rather than final layers, and including more pre-training data improved AAI performance. By contrast, fine-tuning on an ASR task did not. Our results therefore highlight promising prospects for implementing SSL in AAI for languages with limited articulatory data.
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech 2024 |
Publisher | ISCA |
Pages | 4603-4607 |
Number of pages | 5 |
DOIs | |
Publication status | Published - Sept-2024 |
Event | Interspeech 2024 - Kos, Greece Duration: 1-Sept-2024 → 5-Sept-2024 |
Conference
Conference | Interspeech 2024 |
---|---|
Country/Territory | Greece |
City | Kos |
Period | 01/09/2024 → 05/09/2024 |