Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Phat Do, Matt Coler*, Jelske Dijkstra, Esther Klabbers

*Corresponding author voor dit werk

OnderzoeksoutputAcademicpeer review

62 Downloads (Pure)

Samenvatting

We compare using a PHOIBLE-based phone mapping methodand using phonological features input in transfer learning forTTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) andtarget languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu,and Uzbek) to test the language-independence of the methodsand enhance the findings’ applicability. We use Character ErrorRates from automatic speech recognition and predicted MeanOpinion Scores for evaluation. Results show that both phonemapping and features input improve the output quality and thelatter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) witha family tree-based distance measure as a criterion to selectsource languages in transfer learning. ASPF proves effectiveif label-based phone input is used, while the language distancedoes not have expected effects.

Originele taal-2English
Titel12th ISCA Speech Synthesis Workshop (SSW2023)
UitgeverijISCA
Pagina's21-26
Aantal pagina's6
DOI's
StatusPublished - 26-aug.-2023
Evenement12th ISCA Speech Synthesis Workshop (SSW2023) - Grenoble, France
Duur: 26-aug.-202328-aug.-2023

Conference

Conference12th ISCA Speech Synthesis Workshop (SSW2023)
Land/RegioFrance
StadGrenoble
Periode26/08/202328/08/2023

Vingerafdruk

Duik in de onderzoeksthema's van 'Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection'. Samen vormen ze een unieke vingerafdruk.

Citeer dit