Paper Reading AI Learner

A Multi-task Multi-stage Transitional Training Framework for Neural Chat Translation

2023-01-27 14:41:16
Chulun Zhou, Yunlong Liang, Fandong Meng, Jie Zhou, Jinan Xu, Hongji Wang, Min Zhang, Jinsong Su

Abstract

Neural chat translation (NCT) aims to translate a cross-lingual chat between speakers of different languages. Existing context-aware NMT models cannot achieve satisfactory performances due to the following inherent problems: 1) limited resources of annotated bilingual dialogues; 2) the neglect of modelling conversational properties; 3) training discrepancy between different stages. To address these issues, in this paper, we propose a multi-task multi-stage transitional (MMT) training framework, where an NCT model is trained using the bilingual chat translation dataset and additional monolingual dialogues. We elaborately design two auxiliary tasks, namely utterance discrimination and speaker discrimination, to introduce the modelling of dialogue coherence and speaker characteristic into the NCT model. The training process consists of three stages: 1) sentence-level pre-training on large-scale parallel corpus; 2) intermediate training with auxiliary tasks using additional monolingual dialogues; 3) context-aware fine-tuning with gradual transition. Particularly, the second stage serves as an intermediate phase that alleviates the training discrepancy between the pre-training and fine-tuning stages. Moreover, to make the stage transition smoother, we train the NCT model using a gradual transition strategy, i.e., gradually transiting from using monolingual to bilingual dialogues. Extensive experiments on two language pairs demonstrate the effectiveness and superiority of our proposed training framework.

Abstract (translated)

Neural chat translation (NCT)旨在翻译不同语言 speakers之间的跨语言对话。现有的上下文aware NMT模型无法达到令人满意的表现,因为存在以下固有的问题:1) 标注双语对话的资源有限;2) 忽视了对话性质的建模;3) 训练阶段之间存在差异。为了解决这些问题,在本文中,我们提出一种多任务多阶段的过渡(MMT)训练框架,其中NCT模型使用双语对话翻译数据集和额外的单语对话数据进行训练。我们详细设计了两项辅助任务,即句法歧视和说话人歧视,将对话连贯性和说话人特征引入到NCT模型中。训练过程包括三个阶段:1) 大规模并行语料库的句级预训练;2) 使用额外单语对话进行的中间训练;3) 上下文aware微调,并有逐步过渡。特别是,第二个阶段作为中间阶段,减轻预训练和微调阶段之间的训练差异。此外,为了平滑阶段过渡,我们使用逐步过渡策略训练NCT模型,即逐渐从使用单语对话转向双语对话。对两个语言对的广泛实验证明了我们提出的训练框架的效率和优越性。

URL

https://arxiv.org/abs/2301.11749

PDF

https://arxiv.org/pdf/2301.11749.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot