ReproHum #0033-3: Comparable Relative Results with Lower Absolute Values in a Reproduction Study

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

13 Downloads (Pure)

Abstract

In the context of the ReproHum project aimed at assessing the reliability of human evaluation, we replicated the human evaluation conducted in “Generating Scientific Definitions with Controllable Complexity” by August et al. (2022). Specifically, humans were asked to assess the fluency of automatically generated scientific definitions by three different models, with output complexity varying according to target audience. Evaluation conditions were kept as close as possible to the original study, except of necessary and minor adjustments. Our results, despite yielding lower absolute performance, show that relative performance across the three tested systems remains comparable to what was observed in the original paper. On the basis of lower inter-annotator agreement and feedback received from annotators in our experiment, we also observe that the ambiguity of the concept being evaluated may play a substantial role in human assessment.

Original languageEnglish
Title of host publication4th Workshop on Human Evaluation of NLP Systems, HumEval 2024 at LREC-COLING 2024 - Workshop Proceedings
EditorsSimone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
PublisherEuropean Language Resources Association (ELRA)
Pages238-249
Number of pages12
ISBN (Electronic)978-249381441-8
Publication statusPublished - 2024
Event4th Workshop on Human Evaluation of NLP Systems, HumEval 2024 - Torino, Italy
Duration: 21-May-202421-May-2024

Conference

Conference4th Workshop on Human Evaluation of NLP Systems, HumEval 2024
Country/TerritoryItaly
CityTorino
Period21/05/202421/05/2024

Keywords

  • human evaluation
  • reproducibility
  • ReproHum

Fingerprint

Dive into the research topics of 'ReproHum #0033-3: Comparable Relative Results with Lower Absolute Values in a Reproduction Study'. Together they form a unique fingerprint.

Cite this