Towards a Dutch FrameNet lexicon and parser using the data-to-text method

Gosse Minnema, Levi Remijnse

    Research output: Contribution to conferencePosterAcademic

    1 Downloads (Pure)

    Abstract

    Our presentation introduces the Dutch FrameNet project, whose major outcomes will be a FrameNet-based lexicon and semantic parser for Dutch. This project implements the ‘data-to-text’ method (Vossen et al., LREC 2018), which involves collecting structured data about specific types of real-world events, and then linking this to texts referring to these events. By contrast, earlier FrameNet projects started from text corpora without assumptions about the events they describe. As a consequence, these projects cover a wide variety of events and situations (‘frames’), but have a limited number of annotated examples for every frame. By starting from structured domains, we avoid this sparsity problem, facilitating both machine learning and qualitative analyses on texts in the domains we annotate. Moreover, the data-to-text approach allows us to study the three-way relationship between texts, structured data, and frames, highlighting how real-world events are ‘framed’ in texts. We will discuss the implications of using the data-to-text method for the design and theoretical framework of the Dutch FrameNet and for automatic parsing. First of all, a major departure from traditional frame semantics is that we can use structured data to enrich and inform our frame analyses. For example, certain frames have a strong conceptual link to specific events (e.g., a text cannot describe a murder event without evoking the Killing frame), but texts describing these events may evoke these frames in an implicit way (e.g., a murder described without explicitly using words like ‘kill’), which would lead these events to be missed by traditional FrameNet annotations. Moreover, we will investigate how texts refer to the structured data and how to model this in a useful way for annotators. We theorize that variation in descriptions of the real world is driven by pragmatic requirements (e.g., Gricean maxims; Weigand, 1998) and shared event knowledge. For instance, the sentence ‘Feyenoord hit the goal twice’ implies that Feyenoord scored two points, but this conclusion requires knowledge of Feyenoord and what football matches are like. We will present both an analysis of the influence of world knowledge and pragmatic factors on variation in lexical reference, and ways to model this variation in order to annotate references within and between texts concerning the same event. Automatic frame semantic parsing will adopt a multilingual approach: the data-to-text approach makes it relatively easy to gather a corpus of texts in different languages describing the same events. We aim to use techniques such as cross-lingual annotation projection (Evang & Bos, COLING 2016) to adapt existing parsers and resources developed for English to Dutch, our primary target language, but also to Italian, which will help us make FrameNet and semantic parsers based on it more language-independent. Our parsers will be integrated into the Parallel Meaning Bank project (Abzianidze et al., EACL 2017).
    Original languageEnglish
    Publication statusPublished - Jan-2020
    EventComputational Linguistics in the Netherlands - Utrecht, Netherlands
    Duration: 30-Jan-202030-Jan-2020
    Conference number: 30
    https://clin30.sites.uu.nl/

    Conference

    ConferenceComputational Linguistics in the Netherlands
    Abbreviated titleCLIN
    CountryNetherlands
    CityUtrecht
    Period30/01/202030/01/2020
    Internet address

    Cite this