Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Phat Do, Matt Coler*, Jelske Dijkstra, Esther Klabbers

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

44 Downloads (Pure)

Abstract

We compare using a PHOIBLE-based phone mapping methodand using phonological features input in transfer learning forTTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) andtarget languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu,and Uzbek) to test the language-independence of the methodsand enhance the findings’ applicability. We use Character ErrorRates from automatic speech recognition and predicted MeanOpinion Scores for evaluation. Results show that both phonemapping and features input improve the output quality and thelatter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) witha family tree-based distance measure as a criterion to selectsource languages in transfer learning. ASPF proves effectiveif label-based phone input is used, while the language distancedoes not have expected effects.

Original languageEnglish
Title of host publication12th ISCA Speech Synthesis Workshop (SSW2023)
PublisherISCA
Pages21-26
Number of pages6
DOIs
Publication statusPublished - 26-Aug-2023
Event12th ISCA Speech Synthesis Workshop (SSW2023) - Grenoble, France
Duration: 26-Aug-202328-Aug-2023

Conference

Conference12th ISCA Speech Synthesis Workshop (SSW2023)
Country/TerritoryFrance
CityGrenoble
Period26/08/202328/08/2023

Fingerprint

Dive into the research topics of 'Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection'. Together they form a unique fingerprint.

Cite this