Literary authorship attribution with phrase-structure fragments

Andreas van Cranenburgh*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

13 Citations (Scopus)
68 Downloads (Pure)

Abstract

We present a method of authorship attribution and stylometry that exploits hierarchical information in phrase-structures. Contrary to much previous work in stylometry, we focus on content words rather than function words. Texts are parsed to obtain phrase-structures, and compared with texts to be analyzed. An efficient
tree kernel method identifies common tree fragments among data of known authors and unknown texts. These fragments are then used to identify authors and characterize their styles. Our experiments show that the structural information from fragments provides complementary information to the baseline trigram model
Original languageEnglish
Title of host publicationProceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature
Place of PublicationMontréal, Canada
PublisherAssociation for Computational Linguistics (ACL)
Pages59-63
Number of pages5
Publication statusPublished - Jun-2012
Externally publishedYes

Fingerprint

Dive into the research topics of 'Literary authorship attribution with phrase-structure fragments'. Together they form a unique fingerprint.

Cite this