Categorizing Children: Automated Text Classification of CHILDES files

Rob Opsomer, Peter Knoth, Marco Wiering, Freek van Polen, Jantine Trapman

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
123 Downloads (Pure)

Abstract

In this paper we present the application of machine learning text classification methods to two tasks: categorization of children’s speech in the CHILDES Database according to gender and age. Both tasks are binary. For age, we distinguish two age groups between the age of 1.9 and 3.0 years old. The boundary
between the groups lies at the age of 2.4 which is both the mean and the median of the age in our data set. We show that the machine learning approach, based on a bag of words, can achieve much better results than features such as average utterance length or Type-Token Ratio, which are methods traditionally used
by linguists. We have achieved 80.5% and 70.5% classification accuracy for the age and gender task
respectively.
Original languageEnglish
Title of host publicationProceedings of the 20 Belgium-Netherlands Conference on Artificial Intelligence,
Subtitle of host publicationBNAIC
Pages209-216
Publication statusPublished - 2008

Keywords

  • Text Classification
  • Machine learning

Fingerprint

Dive into the research topics of 'Categorizing Children: Automated Text Classification of CHILDES files'. Together they form a unique fingerprint.

Cite this