Multiword Expressions We Live by: A Validated Usage-based Dataset from Corpora of Written Italian

Francesca Masini, M. Silvia Micheli, Andrea Zaninello, Sara Castagnoli, Malvina Nissim

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    5 Downloads (Pure)

    Abstract

    The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.

    Original languageEnglish
    Title of host publicationProceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020, Bologna, Italy, March 1-3, 2021
    EditorsJohanna Monti, Felice Dell'Orletta, Fabio Tamburini
    PublisherCEUR-WS.org
    Number of pages5
    Volume2769
    Publication statusPublished - 2020
    EventItalian Conference on Computational Linguistics 2020 - Bologna, Italy
    Duration: 1-Mar-20213-Mar-2021

    Conference

    ConferenceItalian Conference on Computational Linguistics 2020
    Abbreviated titleCLiC-it 2020
    CountryItaly
    CityBologna
    Period01/03/202103/03/2021

    Cite this