Integrating Local Real Data with Global Gradient Prototypes for Classifier Re-Balancing in Federated Long-Tailed Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively in a data privacy-preserving manner. However, the data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model severely biased to the head classes with the majority of the training samples. To alleviate this issue, decoupled training has recently been introduced to FL, considering it has achieved promising results in centralized long-tailed learning by re-balancing the biased classifier after the instance-balanced training. However, the current study restricts the capacity of decoupled training in federated long-tailed learning with a sub-optimal classifier re-trained on a set of pseudo features, due to the unavailability of a global balanced dataset in FL. In this work, in order to re-balance the classifier more effectively, we integrate the local real data with the global gradient prototypes to form the local balanced datasets, and thus re-balance the classifier during the local training. Furthermore, we introduce an extra classifier in the training phase to help model the global data distribution, which addresses the problem of contradictory optimization goals caused by performing classifier re-balancing locally. Extensive experiments show that our method consistently outperforms the existing state-of-the-art methods in various settings.

Abstract (translated)

分散学习(FL)已成为一种流行的分布式学习范式,涉及多个客户合作训练一个全球模型,以保护数据隐私。然而,数据样本通常在现实世界中遵循长尾巴分布,而FL对分散和长尾巴数据生成一个极不合理的全球模型,大多数训练样本严重偏袒头类。为了缓解这个问题,最近FL引入了脱敏训练,考虑到它在中心化长尾巴学习中取得了令人期望的结果,通过在实例平衡训练后重新平衡偏差分类器。然而,当前研究限制了脱敏训练在Federated长尾巴学习中的能力,使用一个性能不佳的分类器重新训练了一个伪特征集,由于FL中缺乏全球平衡数据集。在这项工作中,为了更有效地重新平衡分类器,我们将其本地实际数据与全球梯度原型合并,形成本地平衡数据集,因此在本地训练期间重新平衡分类器。此外,我们在训练阶段引入了额外的分类器,以帮助模型全球数据分布建模,解决了因 locally Balancing分类器而产生的矛盾优化目标的问题。广泛的实验表明,我们的方法在不同设置中 consistently outperforms existing state-of-the-art methods.

URL

https://arxiv.org/abs/2301.10394

PDF

https://arxiv.org/pdf/2301.10394.pdf