Abstract
Sarcastic speech synthesis, the ability to generate speech that conveys sarcasm, can have several significant implications in various contexts, such as entertainment and better human-computer interaction. This study presents a first attempt to apply transfer learning techniques from a diverse speech style dataset to the challenging domain of sarcastic speech synthesis. The limited availability of specific sarcastic speech data poses significant challenges in capturing the expressive nature of sarcasm. By leveraging transfer learning, a pre-trained model is fine-tuned using a dataset encompassing various speech styles, including sarcastic speech. The synthesized sound contains some robotic elements, indicating moderate performance improvements in sarcastic speech synthesis through transfer learning. Future work will explore the application of multi-modal approaches to improve sarcastic speech synthesis and further enhance the expressiveness and naturalness of generated sarcastic speech.
Original language | English |
---|---|
Title of host publication | Proceedings 12th ISCA Speech Synthesis Workshop (SSW2023) |
Publisher | ISCA |
Pages | 242-243 |
Number of pages | 2 |
Publication status | Published - Aug-2023 |
Event | 12th ISCA Speech Synthesis Workshop (SSW2023) - Grenoble, France Duration: 26-Aug-2023 → 28-Aug-2023 |
Conference
Conference | 12th ISCA Speech Synthesis Workshop (SSW2023) |
---|---|
Country/Territory | France |
City | Grenoble |
Period | 26/08/2023 → 28/08/2023 |