Deep CNN-based Inductive Transfer Learning for Sarcasm Detection in Speech

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)
279 Downloads (Pure)

Abstract

Sarcasm is a frequently used linguistic device which is expressed in a multitude of ways, both with acoustic cues (including pitch, intonation, intensity, etc.) and visual cues (including facial expression, eye gaze, etc.). While cues used in the expression of sarcasm are well-described in the literature, there is a striking paucity of attempts to perform automatic sarcasm detection in speech. To explore this gap, we elaborate a methodology of implementing Inductive Transfer Learning (ITL) based on pre-trained Deep Convolutional Neural Networks (DCNNs) to detect sarcasm in speech. To those ends, the multimodal dataset MUStARD is used as a target dataset in this study. The two selected pre-trained DCNN models used are Xception and VGGish, which we trained on visual and audio datasets. Results show that VGGish, which is applied as a feature extractor in the experiment, performs better than Xception, which has its convolutional layers and pooling layers retrained. Both models achieve a higher F-score compared to the baseline Support Vector Machines (SVM) model by 7% and 5% in unimodal sarcasm detection in speech.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2022
PublisherISCA
Pages2323-2327
Number of pages5
Publication statusPublished - 18-Sept-2022
Event23rd Interspeech Conference - Incheon, Korea, Democratic People's Republic of
Duration: 18-Sept-202222-Sept-2022
https://www.interspeech2022.org/

Conference

Conference23rd Interspeech Conference
Country/TerritoryKorea, Democratic People's Republic of
CityIncheon
Period18/09/202222/09/2022
Internet address

Keywords

  • sarcasm detection
  • Inductive Transfer Learning
  • peech recognition
  • human-computer interaction

Fingerprint

Dive into the research topics of 'Deep CNN-based Inductive Transfer Learning for Sarcasm Detection in Speech'. Together they form a unique fingerprint.

Cite this