A Bigger Fish to Fry: Scaling up the Automatic Understanding of Idiomatic Expressions

Hessel Haagsma

    Research output: ThesisThesis fully internal (DIV)

    135 Downloads (Pure)

    Abstract

    In this thesis, we are concerned with idiomatic expressions and how to handle them within NLP. Idiomatic expressions are a type of multiword phrase which have a meaning that is not a direct combination of the meaning of its parts, e.g. 'at a crossroads' and 'move the goalposts'.

    In Part I, we provide a general introduction to idiomatic expressions and an overview of observations regarding idioms based on corpus data. In addition, we discuss existing research on idioms from an NLP perspective, providing an overview of existing tasks, approaches, and datasets. In Part II, we focus on the building of a large idiom corpus, consisting of developing a system for the automatic extraction of potentially idiom expressions and building a large corpus of idiom using crowdsourced annotation. Finally, in Part III, we improve an existing unsupervised classifier and compare it to other existing classifiers. Given the relatively poor performance of this unsupervised classifier, we also develop a supervised deep neural network-based system and find that a model involving two separate modules looking at different information sources yields the best performance, surpassing previous state-of-the-art approaches.

    In conclusion, this work shows the feasibility of building a large corpus of sense-annotated potentially idiomatic expressions, and the benefits such a corpus provides for further research. It provides the possibility for quick testing of hypotheses about the distribution and usage of idioms, it enables the training of data-hungry machine learning methods for PIE disambiguation systems, and it permits fine-grained, reliable evaluation of such systems.
    Original languageEnglish
    QualificationDoctor of Philosophy
    Awarding Institution
    • University of Groningen
    Supervisors/Advisors
    • Bos, Johan, Supervisor
    • Nissim, Malvina, Supervisor
    Award date3-Sep-2020
    Place of Publication[Groningen]
    Publisher
    Print ISBNs9789403425269
    Electronic ISBNs9789403425252
    DOIs
    Publication statusPublished - 2020

    Cite this