Paper Reading AI Learner

Singular Value Decomposition on Kronecker Adaptation for Large Language Model

2025-06-18 08:28:53
Yee Hin Chong, Peng Qu

Abstract

Large pre-trained Transformer models achieve state-of-the-art results across diverse language and reasoning tasks, but full fine-tuning incurs substantial storage, memory, and computational overhead. Parameter-efficient fine-tuning (PEFT) methods mitigate these costs by learning only a small subset of task-specific parameters, yet existing approaches either introduce inference-time latency (adapter modules), suffer from suboptimal convergence (randomly initialized low-rank updates), or rely on fixed rank choices that may not match task complexity (Kronecker-based decompositions). We propose SoKA (SVD on Kronecker Adaptation), a novel PEFT strategy that combines Kronecker-product tensor factorization with SVD-driven initialization and spectrum-aware dynamic rank selection. Our Kronecker-Product SVD (KPSVD) procedure extracts principal components of the full weight update into compact Kronecker factors, while an adaptive rank selection algorithm uses energy-threshold and elbow-point criteria to prune negligible components. Empirical evaluation on LLaMA2-7B across arithmetic reasoning (GSM8K), formal mathematics (MATH), and code generation (MBPP) demonstrates that SoKA requires only 0.99M trainable parameters, 25% fewer than LoRA/PiSSA, while matching or exceeding baseline performance. Moreover, SoKA exhibits faster convergence and more stable gradients, highlighting its robustness and efficiency for large-scale model adaptation.

Abstract (translated)

大型预训练Transformer模型在各种语言和推理任务中取得了最先进的成果,但完全微调会带来巨大的存储、内存和计算开销。参数高效微调(PEFT)方法通过仅学习一小部分特定于任务的参数来缓解这些成本。然而,现有方法要么引入推理延迟(适配器模块),要么收敛效果不佳(随机初始化的低秩更新),或者依赖于固定的秩选择,这可能无法匹配任务复杂度(基于克罗内克积的方法)。 我们提出了一种新的PEFT策略SoKA(Kronecker适应上的SVD),它结合了克罗内克积张量因子化与SVD驱动的初始化和频谱感知动态秩选择。我们的克罗内克积SVD(KPSVD)过程提取全权重更新的主要成分到紧凑的克罗内克因子中,同时自适应的秩选择算法使用能量阈值和肘点标准来修剪可忽略的成分。 在LLaMA2-7B上对算术推理(GSM8K)、形式数学(MATH)和代码生成(MBPP)任务进行的经验评估表明,SoKA只需要0.99M个可训练参数,比LoRA/PiSSA少25%,同时达到或超过基准性能。此外,SoKA表现出更快的收敛性和更稳定的梯度,突显了其在大规模模型适应中的鲁棒性和效率。

URL

https://arxiv.org/abs/2506.15251

PDF

https://arxiv.org/pdf/2506.15251.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot