Towards Efficient Visual Adaption via Structural Re-parameterization

Abstract
Abstract (translated)
URL
PDF

Abstract

Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various vision tasks by updating or injecting a small number of parameters instead of full fine-tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computationally friendly adapter for giant vision models, called RepAdapter. Specifically, we prove that the adaption modules, even with a complex structure, can be seamlessly integrated into most giant vision models via structural re-parameterization. This property makes RepAdapter zero-cost during inference. In addition to computation efficiency, RepAdapter is more effective and lightweight than existing PETL methods due to its sparse structure and our careful deployment. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, by updating only 0.6% parameters, we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its generalizability is also well validated by a bunch of vision models, i.e., ViT, CLIP, Swin-Transformer and ConvNeXt. Our source code is released at this https URL.

Abstract (translated)

高效参数转移学习(PETL)是一个新兴的研究热点,旨在以低成本的方式将大规模预训练模型适用于后续任务。最近的进展已经成功地通过更新或注入少量参数的方式,在多种视觉任务中减少了存储成本,例如图像和视频分类和语义分割。然而,我们注意到,大多数现有的PETL方法在推理期间仍然会产生显著延迟。在本文中,我们提出了一个巨型视觉模型适配器,称为RepAdapter,它具有较高的参数效率,并且计算密集型。具体来说,我们证明了,即使具有复杂的结构,适配模块可以通过结构重新参数化无缝地融入大多数巨型视觉模型中。这种特性使得RepAdapter在推理期间零成本。除了计算效率,RepAdapter比现有的PETL方法更有效且更轻量级,因为它的稀疏结构和我们的仔细部署。为了验证RepAdapter,我们针对三个视觉任务27个基准数据集进行了广泛的实验,即图像和视频分类和语义分割。实验结果表明,RepAdapter的性能和效率优于最先进的PETL方法。例如,仅更新0.6%的参数,可以将ViT的性能从38.8改善到55.1,其通用性也得到了ViT、CLIP、 Swin-Transformer和ConvNeXt等许多视觉模型的充分验证。我们源代码库在此httpsURL上发布。

URL

https://arxiv.org/abs/2302.08106

PDF

https://arxiv.org/pdf/2302.08106.pdf