Abstract
A prevalent approach in Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViT) involves freezing the majority of the backbone parameters and solely learning low-rank adaptation weight matrices to accommodate downstream tasks. These low-rank matrices are commonly derived through the multiplication structure of down-projection and up-projection matrices, exemplified by methods such as LoRA and Adapter. In this work, we observe an approximate orthogonality among any two row or column vectors within any weight matrix of the backbone parameters; however, this property is absent in the vectors of the down/up-projection matrices. Approximate orthogonality implies a reduction in the upper bound of the model's generalization error, signifying that the model possesses enhanced generalization capability. If the fine-tuned down/up-projection matrices were to exhibit this same property as the pre-trained backbone matrices, could the generalization capability of fine-tuned ViTs be further augmented? To address this question, we propose an Approximately Orthogonal Fine-Tuning (AOFT) strategy for representing the low-rank weight matrices. This strategy employs a single learnable vector to generate a set of approximately orthogonal vectors, which form the down/up-projection matrices, thereby aligning the properties of these matrices with those of the backbone. Extensive experimental results demonstrate that our method achieves competitive performance across a range of downstream image classification tasks, confirming the efficacy of the enhanced generalization capability embedded in the down/up-projection matrices.
Abstract (translated)
在参数高效的微调(Parameter-Efficient Fine-Tuning,PEFT)中,预训练的Vision Transformer (ViT)模型通常会冻结大部分骨干参数,并仅学习低秩适应权重矩阵以适应下游任务。这些低秩矩阵通常是通过降维和升维矩阵的乘法结构得到的,如LoRA和Adapter等方法所展示的那样。在这项工作中,我们观察到,在任何预训练骨干参数的权重矩阵中,任意两行或两列向量之间存在近似的正交性;然而,这种性质在降维/升维矩阵的向量中是不存在的。近似正交性意味着模型泛化误差上限的减小,表明该模型具有增强的泛化能力。如果微调后的降维/升维矩阵也能表现出与预训练骨干矩阵相同的这种特性,那么微调后的ViT模型是否可以进一步提高其泛化能力? 为了解答这个问题,我们提出了一种近似正交微调(Approximately Orthogonal Fine-Tuning, AOFT)策略来表示低秩权重矩阵。该策略使用一个可学习的向量生成一组近似正交的向量,这些向量构成降维/升维矩阵,并使这些矩阵的性质与骨干模型一致。 大量的实验结果表明,我们的方法在一系列下游图像分类任务中表现出具有竞争力的性能,证实了嵌入降维/升维矩阵中的增强泛化能力的有效性。
URL
https://arxiv.org/abs/2507.13260