In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech database containing radio broadcasts, a phonetic lexicon with more than 70k words and a language model trained on a text corpus with more than 38M words. With this contribution, we aim to share the Frisian resources we have collected in the scope of the FAME! project, in which a spoken document retrieval system is built for the disclosure of the regional broadcaster’s radio archives. These resources enable research on code-switching and longitudinal speech and language change. Moreover, a sample automatic speech recognition (ASR) recipe for the Kaldi toolkit will also be provided online to facilitate the Frisian ASR research.
|Title of host publication||Proceedings Interspeech 2016|
|Place of Publication||San Francisco, CA, USA|
|Number of pages||5|
|Publication status||Published - 2016|
|Event||Interspeech 2016 - San Francisco, United States|
Duration: 8-Sep-2016 → 12-Sep-2016
|Period||08/09/2016 → 12/09/2016|