ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings

2023-05-23 06:19:37

Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari

arXiv_CL

arXiv_CL QA Embedding Transformer Pose Emotion Speech Dialog Chat

Abstract
Abstract (translated)
URL
PDF

Abstract

We propose ChatGPT-EDSS, an empathetic dialogue speech synthesis (EDSS) method using ChatGPT for extracting dialogue context. ChatGPT is a chatbot that can deeply understand the content and purpose of an input prompt and appropriately respond to the user's request. We focus on ChatGPT's reading comprehension and introduce it to EDSS, a task of synthesizing speech that can empathize with the interlocutor's emotion. Our method first gives chat history to ChatGPT and asks it to generate three words representing the intention, emotion, and speaking style for each line in the chat. Then, it trains an EDSS model using the embeddings of ChatGPT-derived context words as the conditioning features. The experimental results demonstrate that our method performs comparably to ones using emotion labels or neural network-derived context embeddings learned from chat histories. The collected ChatGPT-derived context information is available at this https URL.

Abstract (translated)

我们提出了ChatGPT-EDSS，一个使用ChatGPT提取对话上下文的同情心对话合成(EDSS)方法。ChatGPT是一个智能对话机器人，能够深入理解输入提示的内容和目的，并适当地响应用户的请求。我们重点关注ChatGPT的阅读理解能力，并将它引入EDSS任务，即合成能够感受对话双方情感的语音。我们的方法首先将对话历史向ChatGPT提供，并要求它生成三句代表对话中每个段落的意图、情感和说话风格的词语。然后，它使用ChatGPT生成上下文单词的embedding作为条件特征，以训练EDSS模型。实验结果显示，我们的方法和从对话历史中学习情感标签或神经网络生成上下文embedding的方法相比表现相似。收集的ChatGPT生成上下文信息可用在这个httpsURL上。

URL

https://arxiv.org/abs/2305.13724

PDF

https://arxiv.org/pdf/2305.13724.pdf