The Corpora They Are a-Changing: a Case Study in Italian Newspapers

Pierpaolo Basile, Annalina Caputo, Tommaso Caselli, Pierluigi Cassotti, Rossella Varvara

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

89 Downloads (Pure)

Abstract

The use of automatic methods for the study of lexical semantic change (LSC) has led to the creation of evaluation benchmarks. Benchmark datasets, however, are intimately tied to the corpus used for their creation questioning their reliability as well as the robustness of automatic methods. This contribution investigates these aspects showing the impact of unforeseen social and cultural dimensions. We also identify a set of additional issues (OCR quality, named entities) that impact the performance of the automatic methods, especially when used to discover LSC.
Original languageEnglish
Title of host publicationProceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change
EditorsNina Tahmasebi, Adam Jatowt, Yang Xu, Simon Hengchen, Syrielle Montariol, Haim Dubossarsky
PublisherAssociation for Computational Linguistics (ACL)
Pages14-20
Number of pages7
DOIs
Publication statusPublished - 27-Jul-2021

Fingerprint

Dive into the research topics of 'The Corpora They Are a-Changing: a Case Study in Italian Newspapers'. Together they form a unique fingerprint.

Cite this