Paper Reading AI Learner

Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation

2025-02-06 22:16:28
Chenyang Huang, Fei Huang, Zaixiang Zheng, Osmar R. Za\"iane, Hao Zhou, Lili Mou

Abstract

Multilingual neural machine translation (MNMT) aims at using one single model for multiple translation directions. Recent work applies non-autoregressive Transformers to improve the efficiency of MNMT, but requires expensive knowledge distillation (KD) processes. To this end, we propose an M-DAT approach to non-autoregressive multilingual machine translation. Our system leverages the recent advance of the directed acyclic Transformer (DAT), which does not require KD. We further propose a pivot back-translation (PivotBT) approach to improve the generalization to unseen translation directions. Experiments show that our M-DAT achieves state-of-the-art performance in non-autoregressive MNMT.

Abstract (translated)

多语种神经机器翻译(MNMT)的目标是使用单一模型来处理多个语言之间的翻译方向。最近的工作应用了非自回归Transformer以提高MNMT的效率,但需要昂贵的知识蒸馏(KD)过程。为此,我们提出了一种用于非自回归多语种机器翻译的M-DAT方法。我们的系统利用了近期发展的有向无环变压器(DAT),这种方法不需要知识蒸馏。此外,我们还提出了一种枢轴回译(PivotBT)的方法来改进对未见过的翻译方向的一般化能力。实验表明,我们的M-DAT在非自回归MNMT中达到了最先进的性能。 具体来说: - MNMT旨在使用一个单一模型来进行多种语言之间的相互翻译。 - 最近的研究工作利用了非自回归Transformer技术以提高翻译效率,但这种方法需要复杂且计算成本高昂的知识蒸馏过程。 - 我们提出了一种新的方法M-DAT,它基于有向无环变压器(DAT)架构,无需进行知识蒸馏步骤就能实现高效的多语言翻译。 - 此外,我们还引入了枢轴回译技术来增强模型对新出现的、之前未见过的语言配对之间的翻译能力。 - 实验结果表明,我们的方法在非自回归MNMT领域取得了当前最佳的效果。

URL

https://arxiv.org/abs/2502.04537

PDF

https://arxiv.org/pdf/2502.04537.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot