Paper Reading AI Learner

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

2024-04-11 18:34:29
Nastaran Saadati, Minh Pham, Nasla Saleem, Joshua R. Waite, Aditya Balu, Zhanhong Jiang, Chinmay Hegde, Soumik Sarkar

Abstract

Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pre-trained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue, drawing inspiration from advanced model merging techniques without requiring additional training, we introduce the Decentralized Iterative Merging-And-Training (DIMAT) paradigm--a novel decentralized deep learning framework. Within DIMAT, each agent is trained on their local data and periodically merged with their neighboring agents using advanced model merging techniques like activation matching until convergence is achieved. DIMAT provably converges with the best available rate for nonconvex functions with various first-order methods, while yielding tighter error bounds compared to the popular existing approaches. We conduct a comprehensive empirical analysis to validate DIMAT's superiority over baselines across diverse computer vision tasks sourced from multiple datasets. Empirical results validate our theoretical claims by showing that DIMAT attains faster and higher initial gain in accuracy with independent and identically distributed (IID) and non-IID data, incurring lower communication overhead. This DIMAT paradigm presents a new opportunity for the future decentralized learning, enhancing its adaptability to real-world with sparse and light-weight communication and computation.

Abstract (translated)

近年来,去中心化深度学习算法的进步在各种任务上取得了尖端性能。然而,实现这种竞争力的关键前提是更新这些模型时产生的显著的通信和计算开销,这禁止将它们应用于现实场景。为了解决这个问题,我们受到先进模型合并技术启发,不需要额外训练,引入了去中心化迭代合并训练(DIMAT)范式——一种新颖的去中心化深度学习框架。在DIMAT中,每个代理都在其局部数据上进行训练,并使用先进模型合并技术(如激活匹配)定期与相邻代理合并,直到收敛。DIMAT通过使用各种第一级方法证明与最优现有方法的收敛率相同,同时将误差边界更紧地推出。我们对DIMAT在各种计算机视觉任务上的优越性进行了全面的实证分析,这些任务来自多个数据集。实证结果证实了我们的理论主张,即DIMAT在独立且等距(IID)和非IID数据上具有更快的收敛速度和更高的初始梯度,同时具有较低的通信开销。这个DIMAT范式为未来的去中心化学习提供了新的机会,通过稀疏和轻量化的通信和计算增强了其在现实场景中的适应性。

URL

https://arxiv.org/abs/2404.08079

PDF

https://arxiv.org/pdf/2404.08079.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot