Abstract
In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech database containing radio broadcasts, a phonetic lexicon with more than 70k words and a language model trained on a text corpus with more than 38M words. With this contribution, we aim to share the Frisian resources we have collected in the scope of the FAME! project, in which a spoken document retrieval system is built for the disclosure of the regional broadcaster’s radio archives. These resources enable research on code-switching and longitudinal speech and language change. Moreover, a sample automatic speech recognition (ASR) recipe for the Kaldi toolkit will also be provided online to facilitate the Frisian ASR research.
Original language | English |
---|---|
Title of host publication | Proceedings Interspeech 2016 |
Place of Publication | San Francisco, CA, USA |
Pages | 1536-1540 |
Number of pages | 5 |
Publication status | Published - 2016 |
Externally published | Yes |
Event | Interspeech 2016 - San Francisco, United States Duration: 8-Sep-2016 → 12-Sep-2016 |
Conference
Conference | Interspeech 2016 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 08/09/2016 → 12/09/2016 |