Discontinuous parsing with an efficient and accurate DOP model

OnderzoeksoutputAcademicpeer review

15 Citaten (Scopus)
27 Downloads (Pure)

Samenvatting

We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We give a direct comparison of a tree-substitution grammar implementation that implicitly represents all fragments from the treebank, versus one that explicitly operates with a significant subset. On the task of discontinuous parsing of German, the latter approach yields a 16 % relative error reduction, requiring only a third of the parsing time and grammar size. Finally, we evaluate the model on several treebanks across three Germanic languages
Originele taal-2English
TitelProceedings of IWPT
Plaats van productieNara, Japan
UitgeverijAssociation for Computational Linguistics (ACL)
Pagina's7-16
Aantal pagina's10
StatusPublished - nov.-2013
Extern gepubliceerdJa

Vingerafdruk

Duik in de onderzoeksthema's van 'Discontinuous parsing with an efficient and accurate DOP model'. Samen vormen ze een unieke vingerafdruk.

Citeer dit