Infrequent forms: noise or not?

Martijn Wieling, Simonetta Montemagni



    In this study we ask the question whether simplifying the data in dialectometrical
    studies by removing infrequent forms is advantageous to uncover the geographical
    structure in dialect data. By investigating lexical variation in a large corpus of
    Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are
    able to identify the main geographical areas together with their linguistic basis. In
    order to assess the influence of infrequent forms, we conduct two analyses: one
    which includes only lexical variants used by at least 0.5% of the informants, and
    another which includes all lexical variants in the data. Using this approach we show
    that using all data enables us to find a geographical characterization with a more
    adequate linguistic basis than by using the trimmed data.
    Originele taal-2English
    TitelThe Future of Dialects
    Subtitelselected papers from Methods in Dialectology XV
    RedacteurenMarie-Hélène Côté, Remco Knooihuizen, John Nerbonne
    UitgeverijLanguage Science Press
    Aantal pagina's10
    ISBN van geprinte versie9783946234197
    StatusPublished - 2016

    Citeer dit