Abstract
We ask subjects whether they perceive as human-produced a bunch of texts, some of which are actually human-written, while others are automatically generated. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, and observe that this fine-tuned model produces texts that are indeed perceived more human-like than the original model. Contextually, we show that our automatic evaluation strategy well correlates with human judgements. We also run a linguistic analysis to unveil the characteristics of human- vs machine-perceived language.
Original language | English |
---|---|
Title of host publication | Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics |
Editors | Antoine Bosselut, Esin Durmus, Varun Prashant Gangal, Sebastian Gehrmann, Yacine Jernite, Laura Perez-Beltrachini, Samira Shaikh, Wei Xu |
Place of Publication | Bangkok, Thailand |
Publisher | Association for Computational Linguistics, ACL Anthology |
Pages | 15-23 |
Number of pages | 9 |
DOIs | |
Publication status | Published - 2021 |