HEAT: Head-level Parameter Efficient Adaptation of Vision Transformers with Taylor-expansion Importance Scores

Abstract
Abstract (translated)
URL
PDF

Abstract

Prior computer vision research extensively explores adapting pre-trained vision transformers (ViT) to downstream tasks. However, the substantial number of parameters requiring adaptation has led to a focus on Parameter Efficient Transfer Learning (PETL) as an approach to efficiently adapt large pre-trained models by training only a subset of parameters, achieving both parameter and storage efficiency. Although the significantly reduced parameters have shown promising performance under transfer learning scenarios, the structural redundancy inherent in the model still leaves room for improvement, which warrants further investigation. In this paper, we propose Head-level Efficient Adaptation with Taylor-expansion importance score (HEAT): a simple method that efficiently fine-tuning ViTs at head levels. In particular, the first-order Taylor expansion is employed to calculate each head's importance score, termed Taylor-expansion Importance Score (TIS), indicating its contribution to specific tasks. Additionally, three strategies for calculating TIS have been employed to maximize the effectiveness of TIS. These strategies calculate TIS from different perspectives, reflecting varying contributions of parameters. Besides ViT, HEAT has also been applied to hierarchical transformers such as Swin Transformer, demonstrating its versatility across different transformer architectures. Through extensive experiments, HEAT has demonstrated superior performance over state-of-the-art PETL methods on the VTAB-1K benchmark.

Abstract (translated)

之前的计算机视觉研究广泛探讨将预训练的视觉 transformer（ViT）适应下游任务。然而，需要适应的参数数量巨大导致了对参数高效的迁移学习（PETL）方法的关注，该方法通过仅训练模型的部分参数，以高效地适应大预训练模型。尽管在迁移学习场景下，显著减少的参数已经表现出良好的性能，但模型的结构冗余仍然需要改进，这需要进一步的研究。在本文中，我们提出了HEAT：一种简单的方法，在头部级别上高效地微调ViT。具体来说，第一阶泰勒展开被用于计算每个头的的重要性分数，称之为泰勒展开重要性分数（TIS），表示其对特定任务的贡献。此外，为了最大化TIS的有效性，我们采用了三种计算TIS的方法。这些方法从不同的角度计算TIS，反映了参数对特定任务的不同贡献。除了ViT之外，HEAT还应用于如Swin Transformer这样的分层 transformer，证明了其在不同变换器架构上的 versatility。通过大量实验，HEAT 在VTAB-1K基准上已经证明了与最先进的PETL方法相比的优越性能。

URL

https://arxiv.org/abs/2404.08894

PDF

https://arxiv.org/pdf/2404.08894.pdf

HEAT: Head-level Parameter Efficient Adaptation of Vision Transformers with Taylor-expansion Importance Scores

Abstract

Abstract (translated)

URL

PDF Copy

PDF