Paper Reading AI Learner

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

2019-06-04 03:30:22
Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li

Abstract

Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.

Abstract (translated)

零镜头翻译是在多语言环境中对系统进行训练的一种新兴技术,它是一种在神经机器翻译(NMT)系统从未被训练过的语言对之间的转换。然而,零射击非机动化训练容易失败,对超参数设置敏感。性能通常远远落后于传统的基于数据透视的方法,后者使用第三种语言作为数据透视转换两次。在这项工作中,我们通过定量分析源和解码句子的语言ID之间的相互信息,来解决捕获伪相关导致的退化问题。受这一分析的启发,我们建议使用两种简单但有效的方法:(1)译码器预训练;(2)反向翻译。这些方法对三个具有挑战性的多语言数据集的普通零镜头翻译有显著的改进(4~22个Bleu点),并取得了与基于Pivot的方法相似或更好的效果。

URL

https://arxiv.org/abs/1906.01181

PDF

https://arxiv.org/pdf/1906.01181.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot