Paper Reading AI Learner

Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation

2025-03-28 16:30:28
Sarubi Thillainathan, Songchen Yuan, En-Shiun Annie Lee, Sanath Jayasena, Surangika Ranathunga

Abstract

Fine-tuning multilingual sequence-to-sequence large language models (msLLMs) has shown promise in developing neural machine translation (NMT) systems for low-resource languages (LRLs). However, conventional single-stage fine-tuning methods struggle in extremely low-resource NMT settings, where training data is very limited. This paper contributes to artificial intelligence by proposing two approaches for adapting msLLMs in these challenging scenarios: (1) continual pre-training (CPT), where the msLLM is further trained with domain-specific monolingual data to compensate for the under-representation of LRLs, and (2) intermediate task transfer learning (ITTL), a method that fine-tunes the msLLM with both in-domain and out-of-domain parallel data to enhance its translation capabilities across various domains and tasks. As an application in engineering, these methods are implemented in NMT systems for Sinhala, Tamil, and English (six language pairs) in domain-specific, extremely low-resource settings (datasets containing fewer than 100,000 samples). Our experiments reveal that these approaches enhance translation performance by an average of +1.47 bilingual evaluation understudy (BLEU) score compared to the standard single-stage fine-tuning baseline across all translation directions. Additionally, a multi-model ensemble further improves performance by an additional BLEU score.

Abstract (translated)

针对低资源语言(LRLs)的神经机器翻译(NMT)系统,微调多语种序列到序列大规模语言模型(msLLMs)显示出了一定潜力。然而,在训练数据极为有限的极端低资源NMT环境中,传统的单阶段微调方法面临着挑战。本文通过提出两种适应msLLMs的方法来应对这些具有挑战性的场景:(1) 连续预训练(CPT),其中msLLM进一步使用特定领域的单语种数据进行训练以补偿LRLs的代表性不足;(2) 中间任务迁移学习(ITTL),这是一种利用域内和域外并行数据对msLLM进行微调的方法,旨在提高其在各种领域和任务中的翻译能力。作为工程应用的一部分,这些方法被应用于斯里兰卡语、泰米尔语与英语之间的六种语言组合的神经机器翻译系统,在特定领域的极端低资源环境下(包含少于10万样本的数据集)。实验结果显示,相较于标准单阶段微调基线模型,在所有翻译方向上,这些建议的方法提升了平均+1.47分的双语评估研究工具(BLEU)评分。此外,通过多模型集成进一步提高了性能,并额外增加了BLEU分数。

URL

https://arxiv.org/abs/2503.22582

PDF

https://arxiv.org/pdf/2503.22582.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot