A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-switching Research

Emre Yilmaz, Maaike Andringa, Sigrid Kingma, Jelske Dijkstra, Frits van der Kuip, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, David van Leeuwen

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

19 Citations (Scopus)
30 Downloads (Pure)

Abstract

We present a new speech database containing 18.5 hours of annotated radio broadcasts in the Frisian language. Frisian is mostly spoken in the province Fryslaˆn and it is the second official language of the Netherlands. The recordings are collected from the archives of Omrop Fryslaˆn, the regional public broadcaster of the province Fryslaˆn. The database covers almost a 50-year time span. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language. Considering the longitudinal and code-switching nature of the data, an appropriate annotation protocol has been designed and the data is manually annotated with the orthographic transcription, speaker identities, dialect information, code-switching details and background noise/music information
Original languageEnglish
Title of host publicationProceedings of the International Conference on Language Resources and Evaluation (LREC) 2016
Place of PublicationPortorož, Slovenia
PublisherEuropean Language Resources Association (ELRA)
Pages4666-4669
Number of pages4
ISBN (Print)9782951740891
Publication statusPublished - 2016
Externally publishedYes

Cite this