Abstract
Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization---a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreement-based learning often results in 2-3 BLEU zero-shot improvement over strong baselines without any loss in performance on supervised translation directions.
Abstract (translated)
多语言翻译的通用性和可靠性通常很大程度上取决于每种感兴趣的语言对的可用并行数据量。在本文中,我们重点讨论零镜头泛化,这是一个具有挑战性的设置,测试模型的翻译方向,他们在培训时没有优化。为了解决这个问题,我们(i)将多语言翻译重新表述为概率推理,(ii)定义零镜头一致性的概念,并说明为什么标准训练经常导致不适合零镜头任务的模型,以及(iii)引入一致的基于协议的训练方法,鼓励该模型产生等效的翻译。辅助语言中的平行句。我们在多个公共零镜头翻译基准(IWSLT17、联合国语料库、Europall)上测试了我们的多语言NMT模型,结果表明,基于协议的学习通常会导致2-3个Bleu零镜头在强基线上的改进,而不会对受监督的翻译方向造成任何性能损失。
URL
https://arxiv.org/abs/1904.02338