Abstract
As socially unacceptable language become pervasive in social media platforms, the need for automatic content moderation become more pressing. This contribution introduces the Dutch Abusive Language Corpus (DALC v1.0), a new dataset with tweets manually an- notated for abusive language. The resource ad- dress a gap in language resources for Dutch and adopts a multi-layer annotation scheme modeling the explicitness and the target of the abusive messages. Baselines experiments on all annotation layers have been conducted, achieving a macro F1 score of 0.748 for binary classification of the explicitness layer and .489 for target classification.
Original language | English |
---|---|
Title of host publication | Proceedings of the 5th Workshop on Online Abuse and Harm |
Editors | Aida Mostafazadeh Davani, Douwe Kiela, Mathias Lambert, Bertie Vidgen, Vinodkumar Prabhakaran, Zeerak Waseem |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 54-66 |
Number of pages | 13 |
DOIs | |
Publication status | Published - 27-Jul-2021 |
Keywords
- language models
- hate speech
- offensive language