Encoding of lexical tone in self-supervised models of spoken language

  • Gaofei Shen
  • , Michaela Watkins
  • , Afra Alishahi
  • , Arianna Bisazza
  • , Grzegorz Chrupała

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

5 Citations (Scopus)
122 Downloads (Pure)

Abstract

Interpretability research has shown that self-supervised Spoken LanguageModels (SLMs) encode a wide variety of features in human speech from theacoustic, phonetic, phonological, syntactic and semantic levels, to speakercharacteristics. The bulk of prior research on representations of phonologyhas focused on segmental features such as phonemes; the encoding ofsuprasegmental phonology (such as tone and stress patterns) in SLMs is not yetwell understood. Tone is a suprasegmental feature that is present in more thanhalf of the world’s languages. This paper aims to analyze the tone encodingcapabilities of SLMs, using Mandarin and Vietnamese as case studies. We showthat SLMs encode lexical tone to a significant degree even when they aretrained on data from non-tonal languages. We further find that SLMs behavesimilarly to native and non-native human participants in tone and consonantperception studies, but they do not follow the same developmental trajectory.
Original languageEnglish
Title of host publicationThe 2024 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationProceedings of the Conference Volume 1: Long Papers
EditorsKevin Duh, Helena Gomez, Steven Bethard
PublisherAssociation for Computational Linguistics (ACL)
Pages4250-4261
Volume1
ISBN (Print)979-8-89176-114-8
DOIs
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Encoding of lexical tone in self-supervised models of spoken language'. Together they form a unique fingerprint.

Cite this