Comparison of machine learning techniques for multi-label genre classification

M. Pieters, M. Wiering

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

4 Citations (Scopus)

Abstract

We compare classic text classification techniques with more recent machine learning techniques and introduce a novel architecture that outperforms many state-of-the-art approaches. These techniques are evaluated on a new multi-label classification task, where the task is to predict the genre of a movie based on its subtitle. We show that pre-trained word embeddings contain ‘universal’ features by using the Semantic-Syntactic Word Relationship test. Furthermore, we explore the effectiveness of a convolutional neural network (CNN) that can extract local features, and a long short term memory network (LSTM) that can find time-dependent relationships. By combining a CNN with an LSTM we observe a strong performance improvement. The technique that performs best is a multi-layer perceptron, with as input the bag-of-words model.
Original languageEnglish
Title of host publicationArti fi cial Intelligence
Subtitle of host publication29th Benelux Conference, BNAIC 2017 Groningen, The Netherlands, November 8 – 9, 2017 Revised Selected Papers
EditorsBart Verheij, Marco Wiering
Place of PublicationCham
PublisherSpringer International Publishing AG
Pages131-145
ISBN (Electronic)978-3-319-76892-2
ISBN (Print)9783319768915
DOIs
Publication statusPublished - 2018
Event29th Benelux Conference, BNAIC 2017
- Groningen, Netherlands
Duration: 8-Nov-20179-Nov-2017

Publication series

Name Communications in Computer and Information Science
Volume823
ISSN (Print)1865-0929

Conference

Conference29th Benelux Conference, BNAIC 2017
CountryNetherlands
CityGroningen
Period08/11/201709/11/2017

Keywords

  • Bag-of-words model
  • CNN model
  • LSTM network
  • Movie subtitles
  • Multi-label text classification
  • Natural language processing

Cite this