TY - GEN
T1 - Meta Learning Text-to-Speech Synthesis in over 7000 Languages
AU - Lux, Florian
AU - Meyer, Sarina
AU - Behringer, Lyonel
AU - Zalkow, Frank
AU - Do, Phat
AU - Coler, Matt
AU - Habets, Emanuel A. P.
AU - Vu, Ngoc Thang
PY - 2024
Y1 - 2024
N2 - In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.
AB - In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.
U2 - 10.21437/Interspeech.2024-1335
DO - 10.21437/Interspeech.2024-1335
M3 - Conference contribution
SP - 4958
EP - 4962
BT - Proceedings of Interspeech 2024
PB - ISCA
T2 - Interspeech 2024
Y2 - 1 September 2024 through 5 September 2024
ER -