Abstract
Large pre-trained Transformer models achieve state-of-the-art results across diverse language and reasoning tasks, but full fine-tuning incurs substantial storage, memory, and computational overhead. Parameter-efficient fine-tuning (PEFT) methods mitigate these costs by learning only a small subset of task-specific parameters, yet existing approaches either introduce inference-time latency (adapter modules), suffer from suboptimal convergence (randomly initialized low-rank updates), or rely on fixed rank choices that may not match task complexity (Kronecker-based decompositions). We propose SoKA (SVD on Kronecker Adaptation), a novel PEFT strategy that combines Kronecker-product tensor factorization with SVD-driven initialization and spectrum-aware dynamic rank selection. Our Kronecker-Product SVD (KPSVD) procedure extracts principal components of the full weight update into compact Kronecker factors, while an adaptive rank selection algorithm uses energy-threshold and elbow-point criteria to prune negligible components. Empirical evaluation on LLaMA2-7B across arithmetic reasoning (GSM8K), formal mathematics (MATH), and code generation (MBPP) demonstrates that SoKA requires only 0.99M trainable parameters, 25% fewer than LoRA/PiSSA, while matching or exceeding baseline performance. Moreover, SoKA exhibits faster convergence and more stable gradients, highlighting its robustness and efficiency for large-scale model adaptation.
Abstract (translated)
大型预训练Transformer模型在各种语言和推理任务中取得了最先进的成果,但完全微调会带来巨大的存储、内存和计算开销。参数高效微调(PEFT)方法通过仅学习一小部分特定于任务的参数来缓解这些成本。然而,现有方法要么引入推理延迟(适配器模块),要么收敛效果不佳(随机初始化的低秩更新),或者依赖于固定的秩选择,这可能无法匹配任务复杂度(基于克罗内克积的方法)。 我们提出了一种新的PEFT策略SoKA(Kronecker适应上的SVD),它结合了克罗内克积张量因子化与SVD驱动的初始化和频谱感知动态秩选择。我们的克罗内克积SVD(KPSVD)过程提取全权重更新的主要成分到紧凑的克罗内克因子中,同时自适应的秩选择算法使用能量阈值和肘点标准来修剪可忽略的成分。 在LLaMA2-7B上对算术推理(GSM8K)、形式数学(MATH)和代码生成(MBPP)任务进行的经验评估表明,SoKA只需要0.99M个可训练参数,比LoRA/PiSSA少25%,同时达到或超过基准性能。此外,SoKA表现出更快的收敛性和更稳定的梯度,突显了其在大规模模型适应中的鲁棒性和效率。
URL
https://arxiv.org/abs/2506.15251