Universal Conditional Masked Language Pre-training for Neural Machine Translation

2022-03-17 10:00:33

Pengfei Li, Liangyou Li, Meng Zhang, Minghao Wu, Qun Liu

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT). Different from prior works where pre-trained models usually adopt an unidirectional decoder, this paper demonstrates that pre-training a sequence-to-sequence model but with a bidirectional decoder can produce notable performance gains for both Autoregressive and Non-autoregressive NMT. Specifically, we propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora in many languages. We also introduce two simple but effective methods to enhance the CeMAT, aligned code-switching & masking and dynamic dual-masking. We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios from low to extremely high resource, i.e., up to 14.4 BLEU on low resource and 7.9 BLEU improvements on average for Autoregressive NMT. For Non-autoregressive NMT, we demonstrate it can also produce consistent performance gains, i.e., up to 5.3 BLEU. As far as we know, this is the first work to pre-train a unified model for fine-tuning on both NMT tasks. Code, data, and pre-trained models are available at this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2203.09210

PDF

https://arxiv.org/pdf/2203.09210.pdf