Evaluation and Adaptation of Neural Language Models for Under-Resourced Languages

Research output: ThesisThesis fully internal (DIV)

565 Downloads (Pure)

Abstract

Language models are now commonly used by researchers, industry, and anyone interested. However, language models of all sizes and types are primarily developed for the English language while efforts on other languages lag behind. This dissertation explores how well non-English language models perform and how to adapt models for higher resource languages to lower-resource languages. With a focus on Dutch, we show high cross-lingual performance. Moreover, we find that language models can be adapted to other higher-resource languages (Dutch and Italian) or to low-resource languages (Gronings and Frisian) with minimal extra training. Finally, we compare how language similarity affects cross-lingual performance and find previously found low performance can be caused by the use of English as a source language.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • University of Groningen
Supervisors/Advisors
  • Nissim, Malvina, Supervisor
  • Wieling, Martijn, Supervisor
Award date6-Jun-2024
Place of Publication[Groningen]
Publisher
DOIs
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Evaluation and Adaptation of Neural Language Models for Under-Resourced Languages'. Together they form a unique fingerprint.

Cite this