Lego-MT: Towards Detachable Models in Massively Multilingual Machine Translation

2022-12-20 18:54:08

Fei Yuan, Yinquan Lu, WenHao Zhu, Lingpeng Kong, Lei Li, Jingjing Xu

arXiv_AI

arXiv_AI NMT Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Traditional multilingual neural machine translation (MNMT) uses a single model to translate all directions. However, with the increasing scale of language pairs, simply using a single model for massive MNMT brings new challenges: parameter tension and large computations. In this paper, we revisit multi-way structures by assigning an individual branch for each language (group). Despite being a simple architecture, it is challenging to train de-centralized models due to the lack of constraints to align representations from all languages. We propose a localized training recipe to map different branches into a unified space, resulting in an efficient detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build the first large-scale open-source translation benchmark covering 7 language-centric data, each containing 445 language pairs. Experiments show that Lego-MT (1.2B) brings gains of more than 4 BLEU while outperforming M2M-100 (12B) (We will public all training data, models, and checkpoints)

Abstract (translated)

URL

https://arxiv.org/abs/2212.10551

PDF

https://arxiv.org/pdf/2212.10551.pdf