Character-based Neural Semantic Parsing

Rik van Noord

    Onderzoeksoutput: Thesis fully internal (DIV)

    139 Downloads (Pure)

    Samenvatting

    Humans and computers do not speak the same language. A lot of day-to-day tasks would be vastly more efficient if we could communicate with computers using natural language instead of relying on an interface. It is necessary, then, that the computer does not see a sentence as a collection of individual words, but instead can understand the deeper, compositional meaning of the sentence. A way to tackle this problem is to automatically assign a formal, structured meaning representation to each sentence, which are easy for computers to interpret. There have been quite a few attempts at this before, but these approaches were usually heavily reliant on predefined rules, word lists or representations of the syntax of the text. This made the general usage of these methods quite complicated.

    In this thesis we employ an algorithm that can learn to automatically assign meaning representations to texts, without using any such external resource. Specifically, we use a type of artificial neural network called a sequence-to-sequence model, in a process that is often referred to as deep learning. The devil is in the details, but we find that this type of algorithm can produce high quality meaning representations, with better performance than the more traditional methods. Moreover, a main finding of the thesis is that, counter intuitively, it is often better to represent the text as a sequence of individual characters, and not words. This is likely the case because it helps the model in dealing with spelling errors, unknown words and inflections.
    Originele taal-2English
    KwalificatieDoctor of Philosophy
    Toekennende instantie
    • Rijksuniversiteit Groningen
    Begeleider(s)/adviseur
    • Bos, Johan, Supervisor
    • Toral Ruiz, Antonio, Co-supervisor
    Datum van toekenning31-mei-2021
    Plaats van publicatie[Groningen]
    Uitgever
    DOI's
    StatusPublished - 2021

    Citeer dit