Exploring Self-Supervised Speech Representations for Cross-lingual Acoustic-to-Articulatory Inversion

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

9 Downloads (Pure)

Abstract

Acoustic-to-articulatory inversion (AAI) is the process of inferring vocal tract movements from acoustic speech signals. Despite its diverse potential applications, AAI research in languages other than English is scarce due to the challenges of collecting articulatory data. In recent years, self-supervised learning (SSL) based representations have shown great potential for addressing low-resource tasks. We utilize wav2vec 2.0 representations and English articulatory data for training AAI systems and investigates their effectiveness for a different language: Dutch. Results show that using mms-1b features can reduce the cross-lingual performance drop to less than 30%. We found that increasing model size, selecting intermediate rather than final layers, and including more pre-training data improved AAI performance. By contrast, fine-tuning on an ASR task did not. Our results therefore highlight promising prospects for implementing SSL in AAI for languages with limited articulatory data.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2024
PublisherISCA
Pages4603-4607
Number of pages5
DOIs
Publication statusPublished - Sept-2024
EventInterspeech 2024 - Kos, Greece
Duration: 1-Sept-20245-Sept-2024

Conference

ConferenceInterspeech 2024
Country/TerritoryGreece
CityKos
Period01/09/202405/09/2024

Fingerprint

Dive into the research topics of 'Exploring Self-Supervised Speech Representations for Cross-lingual Acoustic-to-Articulatory Inversion'. Together they form a unique fingerprint.

Cite this