Paper Reading AI Learner

Usefulness of Emotional Prosody in Neural Machine Translation

2024-04-27 18:04:28
Charles Brazier, Jean-Luc Rouas

Abstract

Neural Machine Translation (NMT) is the task of translating a text from one language to another with the use of a trained neural network. Several existing works aim at incorporating external information into NMT models to improve or control predicted translations (e.g. sentiment, politeness, gender). In this work, we propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice. This work is motivated by the assumption that each emotion is associated with a specific lexicon that can overlap between emotions. Our proposed method follows a two-stage procedure. At first, we select a state-of-the-art Speech Emotion Recognition (SER) model to predict dimensional emotion values from all input audio in the dataset. Then, we use these predicted emotions as source tokens added at the beginning of input texts to train our NMT model. We show that integrating emotion information, especially arousal, into NMT systems leads to better translations.

Abstract (translated)

神经机器翻译(NMT)是将一种语言文本翻译成另一种语言文本的任务,使用训练好的神经网络来实现。为了提高或控制预测的翻译质量(例如:情感、礼貌、性别等),许多现有作品试图将外部信息引入NMT模型中。在这项工作中,我们提出了一种通过添加另一个外部信息源来提高翻译质量的方法:说话者的情感。这项工作源于这样的假设,每个情感都与特定的词汇表相关联,这些词汇表可以在情感之间重叠。我们提出的方法分为两个阶段。首先,我们选择了一个最先进的语音情感识别(SER)模型,预测数据库中所有输入音频的维度情感值。然后,我们将这些预测的情感作为输入文本的开头添加,训练我们的NMT模型。我们证明了将情感信息,特别是兴奋,融入NMT系统会导致更好的翻译。

URL

https://arxiv.org/abs/2404.17968

PDF

https://arxiv.org/pdf/2404.17968.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot