A Systematic Review and Analysis of Multilingual Data Strategies in Text-to-Speech for Low-Resource Languages

  • Phat Do*
  • , Matt Coler
  • , Jelske Dijkstra
  • , Esther Klabbers
  • *Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    6 Citations (Scopus)
    1101 Downloads (Pure)

    Abstract

    We provide a systematic review of past studies that use multilingual data for text-to-speech (TTS) of low-resource languages (LRLs). We focus on the strategies used by these studies for incorporating multilingual data and how they affect output speech quality. To investigate the difference in output quality between corresponding monolingual and multilingual models, we propose a novel measure to compare this difference across the included studies and their various evaluation metrics. This measure, called the Multilingual Model Effect (MLME), is found to be affected by: acoustic model architecture, the difference ratio of target language data between corresponding multilingual and monolingual experiments, the balance ratio of target language data to total data, and the amount of target language data used. These findings can act as reference for data strategies in future experiments with multilingual TTS models for LRLs. Language family classification, despite being widely used, is not found to be an effective criterion for selecting source languages.
    Original languageEnglish
    Title of host publicationProc. Interspeech 2021
    PublisherISCA
    Pages16-20
    Number of pages5
    DOIs
    Publication statusPublished - 30-Aug-2021
    EventInterspeech 2021 - Brno, Czech Republic
    Duration: 30-Aug-20213-Sept-2021

    Conference

    ConferenceInterspeech 2021
    Country/TerritoryCzech Republic
    CityBrno
    Period30/08/202103/09/2021

    Keywords

    • text-to-speech
    • speech synthesis
    • low-resource languages
    • multilingual synthesis
    • cross-lingual synthesis

    Fingerprint

    Dive into the research topics of 'A Systematic Review and Analysis of Multilingual Data Strategies in Text-to-Speech for Low-Resource Languages'. Together they form a unique fingerprint.

    Cite this