Abstract
Sarcasm is a frequently used linguistic device which is expressed in a multitude of ways, both with acoustic cues (including pitch, intonation, intensity, etc.) and visual cues (including facial expression, eye gaze, etc.). While cues used in the expression of sarcasm are well-described in the literature, there is a striking paucity of attempts to perform automatic sarcasm detection in speech. To explore this gap, we elaborate a methodology of implementing Inductive Transfer Learning (ITL) based on pre-trained Deep Convolutional Neural Networks (DCNNs) to detect sarcasm in speech. To those ends, the multimodal dataset MUStARD is used as a target dataset in this study. The two selected pre-trained DCNN models used are Xception and VGGish, which we trained on visual and audio datasets. Results show that VGGish, which is applied as a feature extractor in the experiment, performs better than Xception, which has its convolutional layers and pooling layers retrained. Both models achieve a higher F-score compared to the baseline Support Vector Machines (SVM) model by 7% and 5% in unimodal sarcasm detection in speech.
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech 2022 |
Publisher | ISCA |
Pages | 2323-2327 |
Number of pages | 5 |
Publication status | Published - 18-Sept-2022 |
Event | 23rd Interspeech Conference - Incheon, Korea, Democratic People's Republic of Duration: 18-Sept-2022 → 22-Sept-2022 https://www.interspeech2022.org/ |
Conference
Conference | 23rd Interspeech Conference |
---|---|
Country/Territory | Korea, Democratic People's Republic of |
City | Incheon |
Period | 18/09/2022 → 22/09/2022 |
Internet address |
Keywords
- sarcasm detection
- Inductive Transfer Learning
- peech recognition
- human-computer interaction