Very Deep Transformers for Neural Machine Translation

2020-08-18 07:14:54

Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao

arXiv_CL

arXiv_CL NMT Transformer

Abstract
Abstract (translated)
URL
PDF

Abstract

We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: this https URL.

Abstract (translated)

URL

https://arxiv.org/abs/2008.07772

PDF

https://arxiv.org/pdf/2008.07772.pdf