Abstract
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.
Abstract (translated)
反向传播的主要功能是计算隐藏表示的梯度和参数的梯度,以便使用梯度下降进行优化。训练大型模型需要由于其庞大的参数大小而产生高昂的计算成本。虽然参数高效的微调(PEFT)方法旨在通过训练小辅助模型来节省计算空间,但它们仍然存在计算开销,特别是在微调作为服务(FTaaS)中,对于大量用户来说尤为明显。我们引入了协作适应(ColA)与梯度学习(GL),一种无参数、模型无关的微调方法,它解耦了隐藏表示的梯度计算和参数计算。与PEFT方法相比,ColA通过将梯度计算的计算负担转移给低成本设备,从而实现更具有成本效益的FTaaS。我们还提供了关于ColA的理论和实验分析,并实验证明了ColA在各种基准测试中的表现与现有PEFT方法相当或者更好。
URL
https://arxiv.org/abs/2404.13844