TY - GEN
T1 - “Zo Grof!”
T2 - 6th Workshop on Online Abuse and Harms, WOAH 2022
AU - Ruitenbeek, Ward
AU - Zwart, Victor
AU - van der Noord, Robin
AU - Gnezdilov, Zhenja
AU - Caselli, Tommaso
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022/7
Y1 - 2022/7
N2 - This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch. The corpus extends and revise an existing resource with more data and introduces a new annotation dimension for offensive language, making it a unique resource in the Dutch language panorama. Each language phenomenon (abusive and offensive language) in the corpus has been annotated with a multilayer annotation scheme modelling the explicitness and the target(s) of the abuse/offence in the message. We have conducted a new set of experiments with different classification algorithms on all annotation dimensions. Monolingual Pre-Trained Language Models prove as the best systems, obtaining a macro-average F1 of 0.828 for binary classification of offensive language, and 0.579 for the targets of offensive messages. Furthermore, the best system obtains a macro-average F1 of 0.637 for distinguishing between abusive and offensive messages.
AB - This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch. The corpus extends and revise an existing resource with more data and introduces a new annotation dimension for offensive language, making it a unique resource in the Dutch language panorama. Each language phenomenon (abusive and offensive language) in the corpus has been annotated with a multilayer annotation scheme modelling the explicitness and the target(s) of the abuse/offence in the message. We have conducted a new set of experiments with different classification algorithms on all annotation dimensions. Monolingual Pre-Trained Language Models prove as the best systems, obtaining a macro-average F1 of 0.828 for binary classification of offensive language, and 0.579 for the targets of offensive messages. Furthermore, the best system obtains a macro-average F1 of 0.637 for distinguishing between abusive and offensive messages.
UR - http://www.scopus.com/inward/record.url?scp=85139098903&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.woah-1.5
DO - 10.18653/v1/2022.woah-1.5
M3 - Conference contribution
AN - SCOPUS:85139098903
T3 - WOAH 2022 - 6th Workshop on Online Abuse and Harms, Proceedings of the Workshop
SP - 40
EP - 56
BT - WOAH 2022 - 6th Workshop on Online Abuse and Harms, Proceedings of the Workshop
A2 - Narang, Kanika
A2 - Davani, Aida Mostafazadeh
A2 - Mathias, Lambert
A2 - Vidgen, Bertie
A2 - Talat, Zeerak
PB - Association for Computational Linguistics, ACL Anthology
Y2 - 14 July 2022
ER -