Improving sarcasm detection from speech and text through attention-based fusion exploiting the interplay of emotions and sentiments

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Sarcasm detection presents unique challenges in speech technology, particularly for individuals with dis-orders that affect pitch perception or those lacking contextual auditory cues. While previous research has established the significance of integrating textual, audio and visual data in sarcasm detection, these studies overlook the interactions between modalities. We propose an approach that synergizes audio, textual, sentiment and emotion data to enhance sarcasm detection. This involves augmenting sarcastic audio with corresponding text using Automatic Speech Recognition (ASR), supplemented with information based on emotion recognition and sentiment analysis. Our methodology leverages the strengths of each modality: emotion recognition algorithms analyze the audio data for affective cues, while sentiment analysis processes the text generated from ASR. The integration of these modalities aims to compensate for limitations in current multimodal approach by providing complementary cues essential for accurate sarcasm interpretation. Evaluated on only the audio data of the dataset MUStARD++, our approach has surpassed the state-of-the-art model by 4.79,% F1-score. Our approach improves sarcasm detection in the audio domain, especially beneficial to those with auditory processing challenges. This research highlights the potential of multimodal data fusion in enhancing the subtleties of speech perception and understanding, thus contributing to the advancement of speech technology applications.

Original languageEnglish
Title of host publicationProceedings of Meetings on Acoustics
PublisherAIP Conference proceedings
Number of pages12
Volume54
Edition1
DOIs
Publication statusPublished - 31-Jul-2024
Event186th Meeting of the Acoustical Society of America and the Canadian Acoustical Association - Ottawa, Canada
Duration: 13-May-202417-May-2024

Conference

Conference186th Meeting of the Acoustical Society of America and the Canadian Acoustical Association
Country/TerritoryCanada
CityOttawa
Period13/05/202417/05/2024

Keywords

  • sarcasm
  • multimodal
  • speech and language

Fingerprint

Dive into the research topics of 'Improving sarcasm detection from speech and text through attention-based fusion exploiting the interplay of emotions and sentiments'. Together they form a unique fingerprint.

Cite this