Samenvatting
Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results.
Originele taal-2 | English |
---|---|
Titel | Proceedings of the First Ethics in NLP workshop |
Uitgeverij | Association for Computational Linguistics (ACL) |
Pagina's | 12-22 |
Aantal pagina's | 11 |
DOI's | |
Status | Published - 2017 |