Samenvatting
This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available. Multiple copies of the original model are initially trained on the downstream task. Their predictions are then used to annotate a large set of unlabeled examples. Finally, multi-task training is performed on the parallel annotations of the resulting training set, and final scores are obtained by averaging annotator-specific head predictions. Neural language models are fine-tuned using this procedure in the context of the AcCompl-it shared task at EVALITA 2020, obtaining considerable improvements in prediction quality.
Originele taal-2 | English |
---|---|
Titel | Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020) |
Redacteuren | Valerio Basile, Danilo Croce, Maria Di Maro, Lucia Passaro |
Plaats van productie | Online |
Uitgeverij | CEUR Workshop Proceedings (CEUR-WS.org) |
Status | Published - 17-dec.-2020 |
Extern gepubliceerd | Ja |
Evenement | Evaluation Campaign of Natural Language Processing and Speech Tools for Italian - Online Duur: 17-dec.-2020 → … Congresnummer: 7 |
Workshop
Workshop | Evaluation Campaign of Natural Language Processing and Speech Tools for Italian |
---|---|
Verkorte titel | EVALITA 2020 |
Periode | 17/12/2020 → … |