Abstract
We compare using a PHOIBLE-based phone mapping methodand using phonological features input in transfer learning forTTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) andtarget languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu,and Uzbek) to test the language-independence of the methodsand enhance the findings’ applicability. We use Character ErrorRates from automatic speech recognition and predicted MeanOpinion Scores for evaluation. Results show that both phonemapping and features input improve the output quality and thelatter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) witha family tree-based distance measure as a criterion to selectsource languages in transfer learning. ASPF proves effectiveif label-based phone input is used, while the language distancedoes not have expected effects.
Original language | English |
---|---|
Title of host publication | 12th ISCA Speech Synthesis Workshop (SSW2023) |
Publisher | ISCA |
Pages | 21-26 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 26-Aug-2023 |
Event | 12th ISCA Speech Synthesis Workshop (SSW2023) - Grenoble, France Duration: 26-Aug-2023 → 28-Aug-2023 |
Conference
Conference | 12th ISCA Speech Synthesis Workshop (SSW2023) |
---|---|
Country/Territory | France |
City | Grenoble |
Period | 26/08/2023 → 28/08/2023 |