Paper Reading AI Learner

Consistency by Agreement in Zero-shot Neural Machine Translation

2019-04-04 03:49:05
Maruan Al-Shedivat, Ankur P. Parikh

Abstract

Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization---a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreement-based learning often results in 2-3 BLEU zero-shot improvement over strong baselines without any loss in performance on supervised translation directions.

Abstract (translated)

多语言翻译的通用性和可靠性通常很大程度上取决于每种感兴趣的语言对的可用并行数据量。在本文中,我们重点讨论零镜头泛化,这是一个具有挑战性的设置,测试模型的翻译方向,他们在培训时没有优化。为了解决这个问题,我们(i)将多语言翻译重新表述为概率推理,(ii)定义零镜头一致性的概念,并说明为什么标准训练经常导致不适合零镜头任务的模型,以及(iii)引入一致的基于协议的训练方法,鼓励该模型产生等效的翻译。辅助语言中的平行句。我们在多个公共零镜头翻译基准(IWSLT17、联合国语料库、Europall)上测试了我们的多语言NMT模型,结果表明,基于协议的学习通常会导致2-3个Bleu零镜头在强基线上的改进,而不会对受监督的翻译方向造成任何性能损失。

URL

https://arxiv.org/abs/1904.02338

PDF

https://arxiv.org/pdf/1904.02338.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot