Abstract
We introduce AdaMoLE, a novel method for fine-tuning large language models (LLMs) through an Adaptive Mixture of Low-Rank Adaptation (LoRA) Experts. Moving beyond conventional methods that employ a static top-k strategy for activating experts, AdaMoLE dynamically adjusts the activation threshold using a dedicated threshold network, adaptively responding to the varying complexities of different tasks. By replacing a single LoRA in a layer with multiple LoRA experts and integrating a gating function with the threshold mechanism, AdaMoLE effectively selects and activates the most appropriate experts based on the input context. Our extensive evaluations across a variety of commonsense reasoning and natural language processing tasks show that AdaMoLE exceeds baseline performance. This enhancement highlights the advantages of AdaMoLE's adaptive selection of LoRA experts, improving model effectiveness without a corresponding increase in the expert count. The experimental validation not only confirms AdaMoLE as a robust approach for enhancing LLMs but also suggests valuable directions for future research in adaptive expert selection mechanisms, potentially broadening the scope for optimizing model performance across diverse language processing tasks.
Abstract (translated)
我们提出了AdaMoLE,一种通过自适应混合低秩适应(LoRA)专家来微调大型语言模型(LLMs)的新方法。超越了采用静态top-k策略激活专家的常规方法,AdaMoLE通过专用的阈值网络动态调整激活阈值,并根据不同任务的不断变化复杂性进行自适应响应。通过用多个LoRA专家替换一个层中的单个LoRA,并集成阈机制的gating函数,AdaMoLE有效地基于输入上下文选择和激活最合适的专家。我们对多种常识推理和自然语言处理任务进行的广泛评估表明,AdaMoLE超越了基线性能。这种增强突出了AdaMoLE自适应选择LoRA专家的优势,在不增加专家数量的情况下提高了模型的效果。实验验证不仅证实了AdaMoLE是一种增强LLM的稳健方法,而且为未来的自适应专家选择机制研究提供了有价值的方向,可能拓宽了优化模型性能跨多样化语言处理任务的范围。
URL
https://arxiv.org/abs/2405.00361