Paper Reading AI Learner

Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter

2023-05-21 12:48:38
Yi Liu, Xiaohan Bi, Lei Li, Sishuo Chen, Wenkai Yang, Xu Sun

Abstract

Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources. This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training. This significantly reduces the cost of corpus collection and preserves data privacy. However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck. In this paper, we propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients. Since different language pairs exhibit substantial discrepancies in data distributions, adapter parameters of clients may conflict with each other. To tackle this, we explore various clustering strategies to group parameters for integration and mitigate the negative effects of conflicting parameters. Experimental results demonstrate that our framework reduces communication cost by over 98% while achieving similar or even better performance compared to competitive baselines. Further analysis reveals that clustering strategies effectively solve the problem of linguistic discrepancy and pruning adapter modules further improves communication efficiency.

Abstract (translated)

Federated Multilingual Neural Machine Translation (Fed-MNMT)已经成为缺乏语言资源机构的一个有前途的范式。这种方法允许多个机构作为客户,通过模型同步训练一个统一模型,而不是收集敏感数据进行集中训练。这 significantly降低了语料收集和数据隐私的成本。然而,随着预训练语言模型(PLMs)的越来越大,在同步期间传输参数的通信成本已成为训练速度的瓶颈。在本文中,我们提出了一个通信高效的Fed-MNMT框架,解决这个问题的方法是保持PLMs冻结,仅向客户传输轻量级适配模块。由于不同语言对在数据分布上存在显著差异,客户的适配参数可能会相互冲突。为了解决这一问题,我们探索了各种聚类策略,将参数进行集成并减轻冲突参数的负面影响。实验结果显示,我们的框架可以减少通信成本超过98%,与竞争基准相比,实现类似或甚至更好的性能。进一步分析表明,聚类策略有效地解决了语言差异问题,修剪适配模块进一步提高了通信效率。

URL

https://arxiv.org/abs/2305.12449

PDF

https://arxiv.org/pdf/2305.12449.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot