Discontinuous parsing with an efficient and accurate DOP model

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

15 Citations (Scopus)
27 Downloads (Pure)

Abstract

We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We give a direct comparison of a tree-substitution grammar implementation that implicitly represents all fragments from the treebank, versus one that explicitly operates with a significant subset. On the task of discontinuous parsing of German, the latter approach yields a 16 % relative error reduction, requiring only a third of the parsing time and grammar size. Finally, we evaluate the model on several treebanks across three Germanic languages
Original languageEnglish
Title of host publicationProceedings of IWPT
Place of PublicationNara, Japan
PublisherAssociation for Computational Linguistics (ACL)
Pages7-16
Number of pages10
Publication statusPublished - Nov-2013
Externally publishedYes

Fingerprint

Dive into the research topics of 'Discontinuous parsing with an efficient and accurate DOP model'. Together they form a unique fingerprint.

Cite this