Abstract
Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.
Abstract (translated)
零镜头翻译是在多语言环境中对系统进行训练的一种新兴技术,它是一种在神经机器翻译(NMT)系统从未被训练过的语言对之间的转换。然而,零射击非机动化训练容易失败,对超参数设置敏感。性能通常远远落后于传统的基于数据透视的方法,后者使用第三种语言作为数据透视转换两次。在这项工作中,我们通过定量分析源和解码句子的语言ID之间的相互信息,来解决捕获伪相关导致的退化问题。受这一分析的启发,我们建议使用两种简单但有效的方法:(1)译码器预训练;(2)反向翻译。这些方法对三个具有挑战性的多语言数据集的普通零镜头翻译有显著的改进(4~22个Bleu点),并取得了与基于Pivot的方法相似或更好的效果。
URL
https://arxiv.org/abs/1906.01181