Investigating interoperable event corpora: limitations of reusability of resources and portability of models

Tommaso Caselli*, Johan Bos

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

64 Downloads (Pure)

Abstract

Studies on the applicability of heterogeneous semantically interoperable corpora are rare. We investigate to what extent reusability (both of systems and of annotations) is entailed by corpora whose interoperability is based on compliance to standards. In particular, we look at event detection in English texts, supported by the ISO-TimeML annotation scheme. We run two sets of experiments using a common neural network architecture and extensively evaluate our results on both in-distribution and out-of-distribution settings. In all experimental settings, systems obtain state-of-the-art results on the in-distribution data and underperform out-of-distribution ones, setting limits to the benefits of semantically interoperable corpora. By means of a detailed error analysis, we show that while being compliant to a standard guarantees semantic interoperability, this becomes only a necessary condition for reusability, with factors such as differences in the quality of the annotations having a much stronger impact.

Original languageEnglish
Pages (from-to)1107–1137
Number of pages31
JournalLanguage Resources and Evaluation
Volume57
Early online date26-Feb-2023
DOIs
Publication statusPublished - Sept-2023

Keywords

  • Event detection
  • Portability of systems
  • Reusability of data
  • Semantic interoperability
  • Standards

Fingerprint

Dive into the research topics of 'Investigating interoperable event corpora: limitations of reusability of resources and portability of models'. Together they form a unique fingerprint.

Cite this