Abstract
We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We give a direct comparison of a tree-substitution grammar implementation that implicitly represents all fragments from the treebank, versus one that explicitly operates with a significant subset. On the task of discontinuous parsing of German, the latter approach yields a 16 % relative error reduction, requiring only a third of the parsing time and grammar size. Finally, we evaluate the model on several treebanks across three Germanic languages
Original language | English |
---|---|
Title of host publication | Proceedings of IWPT |
Place of Publication | Nara, Japan |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 7-16 |
Number of pages | 10 |
Publication status | Published - Nov-2013 |
Externally published | Yes |