Paper Reading AI Learner

When to Trust Aggregated Gradients: Addressing Negative Client Sampling in Federated Learning

2023-01-25 03:52:45
Wenkai Yang, Yankai Lin, Guangxiang Zhao, Peng Li, Jie Zhou, Xu Sun

Abstract

Federated Learning has become a widely-used framework which allows learning a global model on decentralized local datasets under the condition of protecting local data privacy. However, federated learning faces severe optimization difficulty when training samples are not independently and identically distributed (non-i.i.d.). In this paper, we point out that the client sampling practice plays a decisive role in the aforementioned optimization difficulty. We find that the negative client sampling will cause the merged data distribution of currently sampled clients heavily inconsistent with that of all available clients, and further make the aggregated gradient unreliable. To address this issue, we propose a novel learning rate adaptation mechanism to adaptively adjust the server learning rate for the aggregated gradient in each round, according to the consistency between the merged data distribution of currently sampled clients and that of all available clients. Specifically, we make theoretical deductions to find a meaningful and robust indicator that is positively related to the optimal server learning rate and can effectively reflect the merged data distribution of sampled clients, and we utilize it for the server learning rate adaptation. Extensive experiments on multiple image and text classification tasks validate the great effectiveness of our method.

Abstract (translated)

分散学习已经成为一种广泛应用的框架,可以在保护本地数据隐私的条件下,学习分布式本地数据集的全球模型。然而,当训练样本不独立且同样分布(非i.i.d.)时,分散学习面临严重的优化难题。在本文中,我们指出,客户采样实践在上述优化难题中起到了决定作用。我们发现,消极客户采样将会导致当前采样客户的合并数据分布与所有可用客户的数据分布严重不一致,并进一步导致聚合梯度的不可靠性。为了解决这一问题,我们提出了一种新的学习速率适应机制,根据当前采样客户和所有可用客户的合并数据分布之间的一致性,自适应地调整服务器学习速率,在每个Round中。具体来说,我们进行理论推导,以找到一种有意义且稳健的指示器,它与最优服务器学习速率呈正相关,能够有效反映采样客户合并数据的分布,因此我们将其用于服务器学习速率适应。对多个图像和文本分类任务进行的广泛实验验证了我们方法的高效性。

URL

https://arxiv.org/abs/2301.10400

PDF

https://arxiv.org/pdf/2301.10400.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot