Abstract
Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pre-trained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue, drawing inspiration from advanced model merging techniques without requiring additional training, we introduce the Decentralized Iterative Merging-And-Training (DIMAT) paradigm--a novel decentralized deep learning framework. Within DIMAT, each agent is trained on their local data and periodically merged with their neighboring agents using advanced model merging techniques like activation matching until convergence is achieved. DIMAT provably converges with the best available rate for nonconvex functions with various first-order methods, while yielding tighter error bounds compared to the popular existing approaches. We conduct a comprehensive empirical analysis to validate DIMAT's superiority over baselines across diverse computer vision tasks sourced from multiple datasets. Empirical results validate our theoretical claims by showing that DIMAT attains faster and higher initial gain in accuracy with independent and identically distributed (IID) and non-IID data, incurring lower communication overhead. This DIMAT paradigm presents a new opportunity for the future decentralized learning, enhancing its adaptability to real-world with sparse and light-weight communication and computation.
Abstract (translated)
近年来,去中心化深度学习算法的进步在各种任务上取得了尖端性能。然而,实现这种竞争力的关键前提是更新这些模型时产生的显著的通信和计算开销,这禁止将它们应用于现实场景。为了解决这个问题,我们受到先进模型合并技术启发,不需要额外训练,引入了去中心化迭代合并训练(DIMAT)范式——一种新颖的去中心化深度学习框架。在DIMAT中,每个代理都在其局部数据上进行训练,并使用先进模型合并技术(如激活匹配)定期与相邻代理合并,直到收敛。DIMAT通过使用各种第一级方法证明与最优现有方法的收敛率相同,同时将误差边界更紧地推出。我们对DIMAT在各种计算机视觉任务上的优越性进行了全面的实证分析,这些任务来自多个数据集。实证结果证实了我们的理论主张,即DIMAT在独立且等距(IID)和非IID数据上具有更快的收敛速度和更高的初始梯度,同时具有较低的通信开销。这个DIMAT范式为未来的去中心化学习提供了新的机会,通过稀疏和轻量化的通信和计算增强了其在现实场景中的适应性。
URL
https://arxiv.org/abs/2404.08079