AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Detecting sarcasm effectively requires a nuanced understanding of context, including vocal tones and facial expressions. The progression towards multimodal computational methods in sarcasm detection, however, faces challenges due to the scarcity of data. To address this, we present AMuSeD (Attentive deep neural network for MUltimodal Sarcasm dEtection incorporating bi-modal Data augmentation). This approach utilizes the Multimodal Sarcasm Detection Dataset (MUStARD) and introduces a two-phase bimodal data augmentation strategy. The first phase involves generating varied text samples through Back-Translation from several secondary languages. The second phase involves the refinement of a FastSpeech2-based speech synthesis system, tailored specifically for sarcasm to retain sarcastic intonations. Alongside a cloud-based Text-to-Speech (TTS) service, this Fine-tuned FastSpeech2 system produces corresponding audio for the text augmentations. We also evaluate various attention mechanisms for selectively enhancing sarcasm-relevant features, finding self-attention to be the most efficient. Our experiments reveal that the proposed approach achieves a significant F1-score of 81.0% in text-audio modalities, surpassing even models that use three modalities from the MUStARD dataset.

Original languageEnglish
Number of pages14
JournalIEEE Transactions on Affective Computing
DOIs
Publication statusE-pub ahead of print - 2-Dec-2025

Keywords

  • attention mechanisms
  • data augmentation
  • multimodality
  • Sarcasm detection
  • speech synthesis

Fingerprint

Dive into the research topics of 'AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation'. Together they form a unique fingerprint.

Cite this