“Zo Grof!”: A Comprehensive Corpus for Offensive and Abusive Language in Dutch

Ward Ruitenbeek, Victor Zwart, Robin van der Noord, Zhenja Gnezdilov, Tommaso Caselli

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

6 Citations (Scopus)
72 Downloads (Pure)

Abstract

This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch. The corpus extends and revise an existing resource with more data and introduces a new annotation dimension for offensive language, making it a unique resource in the Dutch language panorama. Each language phenomenon (abusive and offensive language) in the corpus has been annotated with a multilayer annotation scheme modelling the explicitness and the target(s) of the abuse/offence in the message. We have conducted a new set of experiments with different classification algorithms on all annotation dimensions. Monolingual Pre-Trained Language Models prove as the best systems, obtaining a macro-average F1 of 0.828 for binary classification of offensive language, and 0.579 for the targets of offensive messages. Furthermore, the best system obtains a macro-average F1 of 0.637 for distinguishing between abusive and offensive messages.

Original languageEnglish
Title of host publicationWOAH 2022 - 6th Workshop on Online Abuse and Harms, Proceedings of the Workshop
EditorsKanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, Zeerak Talat
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages40-56
Number of pages17
ISBN (Electronic)9781955917841
DOIs
Publication statusPublished - Jul-2022
Event6th Workshop on Online Abuse and Harms, WOAH 2022 - Seattle, United States
Duration: 14-Jul-2022 → …

Publication series

NameWOAH 2022 - 6th Workshop on Online Abuse and Harms, Proceedings of the Workshop

Conference

Conference6th Workshop on Online Abuse and Harms, WOAH 2022
Country/TerritoryUnited States
CitySeattle
Period14/07/2022 → …

Fingerprint

Dive into the research topics of '“Zo Grof!”: A Comprehensive Corpus for Offensive and Abusive Language in Dutch'. Together they form a unique fingerprint.

Cite this