A Diachronic Italian Corpus based on “L’Unità”

Pierpaolo Basile, Annalina Caputo, Tommaso Caselli, Pierluigi Cassotti, Rossella Varvara

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    5 Citations (Scopus)
    192 Downloads (Pure)

    Abstract

    In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unità”. We automatically clean and annotate the corpus with PoS tags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens,lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series.
    Original languageEnglish
    Title of host publicationCLiC-it 2020 Italian Conference on Computational Linguistics 2020
    Subtitle of host publicationProceedings of the Seventh Italian Conference on Computational Linguistics
    Place of PublicationBologna
    PublisherCEUR Workshop Proceedings (CEUR-WS.org)
    Number of pages6
    Volume2769
    Publication statusPublished - 2020
    EventItalian Conference on Computational Linguistics 2020 - Bologna, Italy
    Duration: 1-Mar-20213-Mar-2021

    Conference

    ConferenceItalian Conference on Computational Linguistics 2020
    Abbreviated titleCLiC-it 2020
    Country/TerritoryItaly
    CityBologna
    Period01/03/202103/03/2021

    Keywords

    • diachronic corpus
    • lexical semantics
    • concept shits
    • italian
    • written corpus

    Fingerprint

    Dive into the research topics of 'A Diachronic Italian Corpus based on “L’Unità”'. Together they form a unique fingerprint.

    Cite this