Abstract
This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available. Multiple copies of the original model are initially trained on the downstream task. Their predictions are then used to annotate a large set of unlabeled examples. Finally, multi-task training is performed on the parallel annotations of the resulting training set, and final scores are obtained by averaging annotator-specific head predictions. Neural language models are fine-tuned using this procedure in the context of the AcCompl-it shared task at EVALITA 2020, obtaining considerable improvements in prediction quality.
Original language | English |
---|---|
Title of host publication | Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020) |
Editors | Valerio Basile, Danilo Croce, Maria Di Maro, Lucia Passaro |
Place of Publication | Online |
Publisher | CEUR Workshop Proceedings (CEUR-WS.org) |
Publication status | Published - 17-Dec-2020 |
Externally published | Yes |
Event | Evaluation Campaign of Natural Language Processing and Speech Tools for Italian - Online Duration: 17-Dec-2020 → … Conference number: 7 |
Workshop
Workshop | Evaluation Campaign of Natural Language Processing and Speech Tools for Italian |
---|---|
Abbreviated title | EVALITA 2020 |
Period | 17/12/2020 → … |
Keywords
- natural language processing
- deep learning
- self-training
- neural language models
- multi-task learning
- linguistic complexity
- linguistic acceptability