Samenvatting
Literature is hard to define. The value-judgment definition holds that literature is a highly valued kind of writing [2, p. 9], but how arbitrary or predictable are such judgments? Moreover, some believe that critics and publishers wield more influence than the text itself [1]. We investigate these questions with a computational model of literature trained on texts. As part of The Riddle of Literary Quality (http://literaryquality.huygens.knaw.nl), an online survey (14k respondents) was conducted among the general public to collect judgments on 401 recent, bestselling Dutch novels. Given a list of author-title pairs, respondents rated novels they had read on a 7-point scale from definitely not to highly literary. We consider the regression task of predicting the mean rating of each novel using features extracted from its text. We train a linear support vector regression model on frequencies of bigrams and syntactic features. The syntactic features consist of tree fragments mined from trees obtained by automatically parsing the novels. Our predictive model explains 57.5 % of the variance in literary ratings, with a root mean squared error of 0.65 on a scale of 0–7 (evaluation based on 5-fold cross-validation with the 401 novels).
This is in line with pilot experiments with a subset of the novels and only bigrams [3]. Although the bigrams form a simple, strong baseline, the syntactic features are more interpretable. We conclude that perceptions of literary ratings can be explained to a large extent from the text itself: there is an intrinsic literariness to literary texts.
This is in line with pilot experiments with a subset of the novels and only bigrams [3]. Although the bigrams form a simple, strong baseline, the syntactic features are more interpretable. We conclude that perceptions of literary ratings can be explained to a large extent from the text itself: there is an intrinsic literariness to literary texts.
Originele taal-2 | English |
---|---|
Tijdschrift | Tiny Transactions on Computer Science |
Volume | 4 |
Status | Published - 2016 |