Open Source Speech and Language Resources for Frisian

Emre Yilmaz, Henk van den Heuvel, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, David van Leeuwen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)
61 Downloads (Pure)

Abstract

In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech database containing radio broadcasts, a phonetic lexicon with more than 70k words and a language model trained on a text corpus with more than 38M words. With this contribution, we aim to share the Frisian resources we have collected in the scope of the FAME! project, in which a spoken document retrieval system is built for the disclosure of the regional broadcaster’s radio archives. These resources enable research on code-switching and longitudinal speech and language change. Moreover, a sample automatic speech recognition (ASR) recipe for the Kaldi toolkit will also be provided online to facilitate the Frisian ASR research.
Original languageEnglish
Title of host publicationProceedings Interspeech 2016
Place of PublicationSan Francisco, CA, USA
Pages1536-1540
Number of pages5
Publication statusPublished - 2016
Externally publishedYes
EventInterspeech 2016 - San Francisco, United States
Duration: 8-Sep-201612-Sep-2016

Conference

ConferenceInterspeech 2016
Country/TerritoryUnited States
CitySan Francisco
Period08/09/201612/09/2016

Cite this