Projecting dialect distances to geography: Bootstrap clustering vs. noisy clustering

John Nerbonne, Wilbert Heeringa, Franz Manni, P. Kleiweg

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    31 Citations (Scopus)

    Abstract

    Dialectometry produces aggregate DISTANCE MATRICES in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping DIALECT AREAS. The importance of dialect areas has been challenged by proponents Of CONTINUA, but they too need to compare their findings to older literature, expressed in terms of areas.

    Simple clustering is unstable, meaning that small differences in the input matrix can lead to large differences in results (Jain et al. 1999). This is illustrated with a 500-site data set from Bulgaria, where input matrices which correlate very highly (r = 0.97) still yield very different clusterings. Kleiweg et al. (2004) introduce COMPOSITE CLUSTERING, in which random noise is added to matrices during repeated clustering. The resulting borders are then projected onto the map.

    The present contribution compares Kleiweg et al.'s procedure to resampled bootstrapping, and also shows how the same procedure used to project borders from composite clustering may be used to project borders from bootstrapping.

    Original languageEnglish
    Title of host publicationData Analysis, Machine Learning and Applications. Proceedings of the 31st Annual Conference ofthe Gesellschaft für Klassifikation e.V., Albert-Ludwigs Universität Freiburg, March 7-9, 2007
    EditorsC Preisach, H Burkhardt, L SchmidtThieme, R Decker
    Place of PublicationBERLIN
    PublisherSpringer
    Pages647-654
    Number of pages8
    ISBN (Print)978-3-540-78239-1
    Publication statusPublished - 2008
    Event31st Annual Conference of the German-Classification-Society - , Germany
    Duration: 7-Mar-20079-Mar-2007

    Publication series

    NameSTUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION
    PublisherSPRINGER-VERLAG BERLIN
    ISSN (Print)1431-8814

    Other

    Other31st Annual Conference of the German-Classification-Society
    CountryGermany
    Period07/03/200709/03/2007

    Cite this