Abstract
Interpretability research has shown that self-supervised Spoken LanguageModels (SLMs) encode a wide variety of features in human speech from theacoustic, phonetic, phonological, syntactic and semantic levels, to speakercharacteristics. The bulk of prior research on representations of phonologyhas focused on segmental features such as phonemes; the encoding ofsuprasegmental phonology (such as tone and stress patterns) in SLMs is not yetwell understood. Tone is a suprasegmental feature that is present in more thanhalf of the world’s languages. This paper aims to analyze the tone encodingcapabilities of SLMs, using Mandarin and Vietnamese as case studies. We showthat SLMs encode lexical tone to a significant degree even when they aretrained on data from non-tonal languages. We further find that SLMs behavesimilarly to native and non-native human participants in tone and consonantperception studies, but they do not follow the same developmental trajectory.
| Original language | English |
|---|---|
| Title of host publication | The 2024 Conference of the North American Chapter of the Association for Computational Linguistics |
| Subtitle of host publication | Proceedings of the Conference Volume 1: Long Papers |
| Editors | Kevin Duh, Helena Gomez, Steven Bethard |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 4250-4261 |
| Volume | 1 |
| ISBN (Print) | 979-8-89176-114-8 |
| DOIs | |
| Publication status | Published - 2024 |
Fingerprint
Dive into the research topics of 'Encoding of lexical tone in self-supervised models of spoken language'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver