Abstract
Neural Machine Translation (NMT) is the task of translating a text from one language to another with the use of a trained neural network. Several existing works aim at incorporating external information into NMT models to improve or control predicted translations (e.g. sentiment, politeness, gender). In this work, we propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice. This work is motivated by the assumption that each emotion is associated with a specific lexicon that can overlap between emotions. Our proposed method follows a two-stage procedure. At first, we select a state-of-the-art Speech Emotion Recognition (SER) model to predict dimensional emotion values from all input audio in the dataset. Then, we use these predicted emotions as source tokens added at the beginning of input texts to train our NMT model. We show that integrating emotion information, especially arousal, into NMT systems leads to better translations.
Abstract (translated)
神经机器翻译(NMT)是将一种语言文本翻译成另一种语言文本的任务,使用训练好的神经网络来实现。为了提高或控制预测的翻译质量(例如:情感、礼貌、性别等),许多现有作品试图将外部信息引入NMT模型中。在这项工作中,我们提出了一种通过添加另一个外部信息源来提高翻译质量的方法:说话者的情感。这项工作源于这样的假设,每个情感都与特定的词汇表相关联,这些词汇表可以在情感之间重叠。我们提出的方法分为两个阶段。首先,我们选择了一个最先进的语音情感识别(SER)模型,预测数据库中所有输入音频的维度情感值。然后,我们将这些预测的情感作为输入文本的开头添加,训练我们的NMT模型。我们证明了将情感信息,特别是兴奋,融入NMT系统会导致更好的翻译。
URL
https://arxiv.org/abs/2404.17968