A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature

Andreas van Cranenburgh*, Esther Ploeger, Frank van den Berg, Remi Thüss

*Bijbehorende auteur voor dit werk

    OnderzoeksoutputAcademicpeer review

    33 Downloads (Pure)

    Samenvatting

    We introduce a modular, hybrid coreference resolution system that extends a rule-based baseline with three neural classifiers for the subtasks mention detection, mention attributes (gender, animacy, number), and pronoun resolution. The classifiers substantially increase coreference performance in our experiments with Dutch literature across all metrics on the development set: mention detection, LEA, CoNLL, and especially pronoun accuracy. However, on the test set, the best results are obtained with rule-based pronoun resolution. This inconsistent result highlights that the rule-based system is still a strong baseline, and more work is needed to improve pronoun resolution robustly for this dataset. While end-to-end neural systems require no feature engineering and achieve excellent performance in standard benchmarks with large training sets, our simple hybrid system scales well to long document coreference (>10k words) and attains superior results in our experiments on literature.
    Originele taal-2English
    TitelProceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
    RedacteurenMaciej Ogrodniczuk, Sameer Pradhan, Massimo Poesio, Yulia Grishina, Vincent Ng
    UitgeverijAssociation for Computational Linguistics (ACL)
    Pagina's47-56
    Aantal pagina's10
    StatusPublished - nov-2021

    Citeer dit