GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models

Tommaso Caselli, Davide Colla, Valerio Basile, Jelena Mitrović, Michael Granitzer

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    63 Downloads (Pure)

    Abstract

    We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN, .7619 for DA, and .7789 for TR).
    Original languageEnglish
    Title of host publicationProceedings of the 14th International Workshop on Semantic Evaluation
    EditorsAurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
    PublisherAssociation for Computational Linguistics (ACL)
    Number of pages9
    Publication statusPublished - 2020
    EventFourteenth Workshop on Semantic Evaluation - Barcelona, Spain
    Duration: 12-Dec-202013-Dec-2020

    Workshop

    WorkshopFourteenth Workshop on Semantic Evaluation
    Abbreviated titleSemEval-2020
    Country/TerritorySpain
    CityBarcelona
    Period12/12/202013/12/2020

    Keywords

    • offensive language
    • abusive language
    • BERT
    • CANCER-PATIENTS

    Fingerprint

    Dive into the research topics of 'GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models'. Together they form a unique fingerprint.

    Cite this