Paper Reading AI Learner

Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models

2024-01-11 12:11:30
Pengzhi Gao, Zhongjun He, Hua Wu, Haifeng Wang

Abstract

The training paradigm for machine translation has gradually shifted, from learning neural machine translation (NMT) models with extensive parallel corpora to instruction finetuning on pretrained multilingual large language models (LLMs) with high-quality translation pairs. In this paper, we focus on boosting the many-to-many multilingual translation performance of LLMs with an emphasis on zero-shot translation directions. We demonstrate that prompt strategies adopted during instruction finetuning are crucial to zero-shot translation performance and introduce a cross-lingual consistency regularization, XConST, to bridge the representation gap among different languages and improve zero-shot translation performance. XConST is not a new method, but a version of CrossConST (Gao et al., 2023a) adapted for multilingual finetuning on LLMs with translation instructions. Experimental results on ALMA (Xu et al., 2023) and LLaMA-2 (Touvron et al., 2023) show that our approach consistently improves translation performance. Our implementations are available at this https URL.

Abstract (translated)

机器翻译的训练范式逐渐从学习具有广泛并行语料库的神经机器翻译(NMT)模型转向对预训练的多语言大型语言模型(LLM)进行指令微调。在本文中,我们重点提高具有零散翻译方向的多语言LLM的性能,特别关注零散翻译方向。我们证明了在指令微调过程中采用的提示策略对零散翻译性能至关重要,并引入了一种跨语言一致性正则化XConST,以弥合不同语言之间的表示差距,并提高零散翻译性能。XConST不是一种新的方法,而是一种适应于带有翻译指令的多语言LLM的CrossConST的变体。ALMA(Xu et al., 2023)和LLaMA-2(Touvron et al., 2023)的实验结果表明,我们的方法会持续提高翻译性能。我们的实现可以在https://this URL上找到。

URL

https://arxiv.org/abs/2401.05861

PDF

https://arxiv.org/pdf/2401.05861.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot