Investigating interoperable event corpora: limitations of reusability of resources and portability of models

Tommaso Caselli*, Johan Bos

*Corresponding author voor dit werk

OnderzoeksoutputAcademicpeer review

42 Downloads (Pure)

Samenvatting

Studies on the applicability of heterogeneous semantically interoperable corpora are rare. We investigate to what extent reusability (both of systems and of annotations) is entailed by corpora whose interoperability is based on compliance to standards. In particular, we look at event detection in English texts, supported by the ISO-TimeML annotation scheme. We run two sets of experiments using a common neural network architecture and extensively evaluate our results on both in-distribution and out-of-distribution settings. In all experimental settings, systems obtain state-of-the-art results on the in-distribution data and underperform out-of-distribution ones, setting limits to the benefits of semantically interoperable corpora. By means of a detailed error analysis, we show that while being compliant to a standard guarantees semantic interoperability, this becomes only a necessary condition for reusability, with factors such as differences in the quality of the annotations having a much stronger impact.

Originele taal-2English
Pagina's (van-tot)1107–1137
Aantal pagina's31
TijdschriftLanguage Resources and Evaluation
Volume57
Vroegere onlinedatum26-feb.-2023
DOI's
StatusPublished - sep.-2023

Vingerafdruk

Duik in de onderzoeksthema's van 'Investigating interoperable event corpora: limitations of reusability of resources and portability of models'. Samen vormen ze een unieke vingerafdruk.

Citeer dit