Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have significantly improved the adaptation of LLMs to downstream tasks in a resource-efficient manner. However, in multi-task scenarios, challenges such as training imbalance and the seesaw effect frequently emerge. Mixture-of-LoRA (MoLoRA), which combines LoRA with sparse Mixture-of-Experts, mitigates some of these issues by promoting task-specific learning across experts. Despite this, MoLoRA remains inefficient in terms of training speed, parameter utilization, and overall multi-task performance. In this paper, we propose Mixture of Asymmetric Low-Rank Adaptaion (MALoRA), a flexible fine-tuning framework that leverages asymmetric optimization across LoRA experts. MALoRA reduces the number of trainable parameters by 30% to 48%, increases training speed by 1.2x, and matches the computational efficiency of single-task LoRA models. Additionally, MALoRA addresses overfitting issues commonly seen in high-rank configurations, enhancing performance stability. Extensive experiments across diverse multi-task learning scenarios demonstrate that MALoRA consistently outperforms all baseline methods in both inter-domain and intra-domain tasks.
Abstract (translated)
参数高效的微调(PEFT)方法,如LoRA,显著提高了大语言模型在下游任务中的适应性,并且资源效率高。然而,在多任务场景中,经常出现训练不平衡和跷跷板效应等挑战。混合LoRA (MoLoRA) 结合了LoRA与稀疏的专家混合,通过促进各专家的任务特定学习来缓解部分这些问题。尽管如此,MoLoRA在训练速度、参数利用率以及整体多任务性能方面仍然存在效率问题。本文中,我们提出了一种灵活的微调框架——非对称低秩适应的混合(MALoRA),该框架利用了跨LoRA专家的非对称优化。MALoRA将可训练参数的数量减少了30%到48%,将训练速度提高了1.2倍,并达到了与单任务LoRA模型相当的计算效率。此外,MALoRA解决了高秩配置中常见的过拟合问题,增强了性能稳定性。在多样化的多任务学习场景中的广泛实验表明,MALoRA在跨领域和内领域任务上均持续优于所有基线方法。
URL
https://arxiv.org/abs/2410.22782