Abstract
The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
Original language | English |
---|---|
Title of host publication | Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020, Bologna, Italy, March 1-3, 2021 |
Editors | Johanna Monti, Felice Dell'Orletta, Fabio Tamburini |
Publisher | CEUR-WS.org |
Number of pages | 5 |
Volume | 2769 |
Publication status | Published - 2020 |
Event | Italian Conference on Computational Linguistics 2020 - Bologna, Italy Duration: 1-Mar-2021 → 3-Mar-2021 |
Conference
Conference | Italian Conference on Computational Linguistics 2020 |
---|---|
Abbreviated title | CLiC-it 2020 |
Country | Italy |
City | Bologna |
Period | 01/03/2021 → 03/03/2021 |