Ensemble similarity measures for clustering terms

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    4 Citations (Scopus)
    42 Downloads (Pure)

    Abstract

    Clustering semantically related terms is crucial for many applications such as document categorization, and word sense disambiguation. However, automatically identifying semantically similar terms is challenging. We present a novel approach for automatically determining the degree of relatedness between terms to facilitate their subsequent clustering. Using the analogy of ensemble classifiers in Machine Learning, we combine multiple techniques like contextual similarity and semantic relatedness to boost the accuracy of our computations. A new method, based on Yarowsky's [9] word sense disambiguation approach, to generate high-quality topic signatures for contextual similarity computations, is presented. A technique to measure semantic relatedness between multi-word terms, based on the work of Hirst and St. Onge [2] is also proposed. Experimental evaluation reveals that our method outperforms similar related works. We also investigate the effects of assigning different importance levels to the different similarity measures based on the corpus characteristics.

    Original languageEnglish
    Title of host publication2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009
    PublisherIEEE
    Pages315-319
    Number of pages5
    ISBN (Print)9780769535074
    DOIs
    Publication statusPublished - 2009
    Event2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009 - Los Angeles, CA, United States
    Duration: 31-Mar-20092-Apr-2009

    Conference

    Conference2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009
    Country/TerritoryUnited States
    CityLos Angeles, CA
    Period31/03/200902/04/2009

    Fingerprint

    Dive into the research topics of 'Ensemble similarity measures for clustering terms'. Together they form a unique fingerprint.

    Cite this