Extraction of phrase-structure fragments with a linear average time tree kernel

Andreas van Cranenburgh*

*Corresponding author voor dit werk

OnderzoeksoutputAcademicpeer review

4 Citaten (Scopus)
31 Downloads (Pure)

Samenvatting

We present an algorithm and implementation for extracting recurring fragments from treebanks. Using a tree-kernel method the largest common fragments are extracted from each pair of trees. The algorithm presented achieves a thirty-fold speedup over the previously available method on the Wall Street Journal dataset. It is also more general, in that it supports trees with discontinuous constituents. The resulting fragments can be used as a tree-substitution grammar or in classification problems such as authorship attribution and other stylometry tasks.
Originele taal-2English
Pagina's (van-tot)3-16
Aantal pagina's14
TijdschriftComputational Linguistics in the Netherlands Journal
Volume4
StatusPublished - 1-dec.-2014
Extern gepubliceerdJa

Vingerafdruk

Duik in de onderzoeksthema's van 'Extraction of phrase-structure fragments with a linear average time tree kernel'. Samen vormen ze een unieke vingerafdruk.

Citeer dit