Samenvatting
We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We give a direct comparison of a tree-substitution grammar implementation that implicitly represents all fragments from the treebank, versus one that explicitly operates with a significant subset. On the task of discontinuous parsing of German, the latter approach yields a 16 % relative error reduction, requiring only a third of the parsing time and grammar size. Finally, we evaluate the model on several treebanks across three Germanic languages
Originele taal-2 | English |
---|---|
Titel | Proceedings of IWPT |
Plaats van productie | Nara, Japan |
Uitgeverij | Association for Computational Linguistics (ACL) |
Pagina's | 7-16 |
Aantal pagina's | 10 |
Status | Published - nov.-2013 |
Extern gepubliceerd | Ja |