An Analysis of Cross-Genre and In-Genre Performance for Author Profiling in Social Media

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    8 Citations (Scopus)


    User profiling on social media data is normally done within a supervised setting. A typical feature of supervised models that are trained on data from a specific genre, is their limited portability to other genres. Cross-genre models were developed in the context of PAN 2016, where systems were trained on tweets, and tested on other non-tweet social media data. Did the model that achieved best results at this task got lucky or was it truly designed in a cross-genre manner, with features general enough
    to capture demographics beyond Twitter? We explore this question via a series of in-genre and cross-genre experiments on English and Spanish using the best performing system at PAN 2016, and discover that portability is successful to a certain extent, provided that the sub-genres involved are close enough. In such cases, it is also more beneficial to do cross-genre than in-genre modelling if the cross-genre setting can benefit from largeramounts of training data than those available in-genre.
    Original languageEnglish
    Title of host publicationExperimental IR Meets Multilinguality, Multimodality, and Interaction
    Subtitle of host publication8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11–14, 2017, Proceedings
    Number of pages13
    ISBN (Electronic)978-3-319-65813-1, 978-3-319-65812-4
    Publication statusPublished - 2017

    Cite this