Samenvatting
In this work we present a taxonomy of error categories for lexical normalization, which is the task of translating user generated content to canonical language. We annotate a recent normalization dataset to test the practical use of the taxonomy and read a near-perfect agreement. This annotated dataset is then used to evaluate how an existing normalization model performs on the different categories of the taxonomy. The results of this evaluation reveal that some of the problematic categories only include minor transformations, whereas most regular transformations are solved quite well.
Originele taal-2 | English |
---|---|
Titel | LREC 2018, Eleventh International Conference on Language Resources and Evaluation |
Plaats van productie | Paris |
Uitgeverij | European Language Resources Association (ELRA) |
Pagina's | 684-688 |
Aantal pagina's | 5 |
ISBN van geprinte versie | 979-10-95546-00-9 |
Status | Published - 2018 |
Evenement | Eleventh International Conference on Language Resources and Evaluation - Phoenix Seagaia Resort , Miyazaki , Japan Duur: 7-mei-2018 → 12-mei-2018 http://lrec2018.lrec-conf.org/en/ |
Conference
Conference | Eleventh International Conference on Language Resources and Evaluation |
---|---|
Verkorte titel | LREC 2018 |
Land/Regio | Japan |
Stad | Miyazaki |
Periode | 07/05/2018 → 12/05/2018 |
Internet adres |
Vingerafdruk
Duik in de onderzoeksthema's van 'A Taxonomy for In-depth Evaluation of Normalization for User Generated Content'. Samen vormen ze een unieke vingerafdruk.Datasets
-
Taxonomy for Normalization
Goot, van der, R. (Creator) & van Noord, R. (Creator), University of Groningen, 2018
https://bitbucket.org/robvanderg/normtax
Dataset