Representing Low-Resource Languages and Dialects: Improved Neural Methods for Spoken Language Processing

Martijn Bartelds

Research output: ThesisThesis fully internal (DIV)

439 Downloads (Pure)

Abstract

Languages are fundamental to human communication and serve as a means to express social and cultural values. However, many people treat languages as homogeneous entities, disregarding the fact that they are often composed of multiple varieties. These language varieties may be tied to certain geographical locations or the cultural identity of the speakers.

Studying language variation can thus provide valuable insights into how language varieties relate to their linguistic communities. Most language varieties do not correspond to administrative boundaries, such as provinces or states within nations, and neighboring varieties often transition gradually.

In this dissertation, we presented a new method to describe and model linguistic diversity. Specifically, we leveraged deep learning or artificial neural network models to quantify differences between the pronunciations of speakers from different language varieties. This new method assesses the differences between language varieties more accurately and efficiently compared to previously-used methods.

Additionally, we investigated the use of these neural network models to develop speech technology to help empower language varieties. We developed an audio-based search algorithm that can automatically identify occurrences of a spoken search term in a large collection of spoken materials, improving access to resources that would normally require manual annotation. Furthermore, we presented approaches to improve speech recognition performance for several language varieties from different language families. This technology could, for example, be used to generate subtitles for videos or television broadcasts. This can be a promising step towards the important goal of developing speech technology that is inclusive of the world’s languages.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • University of Groningen
Supervisors/Advisors
  • Wieling, Martijn, Supervisor
  • Liberman, Mark, Supervisor, External person
Award date16-Nov-2023
Place of Publication[Groningen]
Publisher
DOIs
Publication statusPublished - 2023

Fingerprint

Dive into the research topics of 'Representing Low-Resource Languages and Dialects: Improved Neural Methods for Spoken Language Processing'. Together they form a unique fingerprint.

Cite this