Benchmarking Offensive and Abusive Language in Dutch Tweets

Tommaso Caselli, Hylke van der Veen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
59 Downloads (Pure)

Abstract

We present an extensive evaluation of different fine-tuned models to detect instances of offensive and abusive language in Dutch across three benchmarks: a standard held-out test, a task-agnostic functional benchmark, and a dynamic test set. We also investigate the use of data cartography to identify high quality training data. Our results show a relatively good quality of the manually annotated data used to train the models while highlighting some critical weakness. We have also found a good portability of trained models along the same language phenomena. As for the data cartography, we have found a positive impact only on the functional benchmark and when selecting data per annotated dimension rather than using the entire training material.

Original languageEnglish
Title of host publicationACL 2023 - 7th Workshop on Online Abuse and Harms, WOAH 2023 - Proceedings of the Workshop
EditorsYi-Ling Chung, Aida Mostafazadeh Davani, Debora Nozza, Paul Rottger, Zeerak Talat
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages69-84
Number of pages16
ISBN (Electronic)9781959429814
DOIs
Publication statusPublished - 2023
Event7th Workshop on Online Abuse and Harms, WOAH 2023, co-located with ACL 2023 - Toronto, Canada
Duration: 13-Jul-2023 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference7th Workshop on Online Abuse and Harms, WOAH 2023, co-located with ACL 2023
Country/TerritoryCanada
CityToronto
Period13/07/2023 → …

Fingerprint

Dive into the research topics of 'Benchmarking Offensive and Abusive Language in Dutch Tweets'. Together they form a unique fingerprint.

Cite this