Paper Reading AI Learner

Towards Efficient Visual Adaption via Structural Re-parameterization

2023-02-16 06:14:15
Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji


Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various vision tasks by updating or injecting a small number of parameters instead of full fine-tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computationally friendly adapter for giant vision models, called RepAdapter. Specifically, we prove that the adaption modules, even with a complex structure, can be seamlessly integrated into most giant vision models via structural re-parameterization. This property makes RepAdapter zero-cost during inference. In addition to computation efficiency, RepAdapter is more effective and lightweight than existing PETL methods due to its sparse structure and our careful deployment. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, by updating only 0.6% parameters, we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its generalizability is also well validated by a bunch of vision models, i.e., ViT, CLIP, Swin-Transformer and ConvNeXt. Our source code is released at this https URL.

Abstract (translated)

高效参数转移学习(PETL)是一个新兴的研究热点,旨在以低成本的方式将大规模预训练模型适用于后续任务。最近的进展已经成功地通过更新或注入少量参数的方式,在多种视觉任务中减少了存储成本,例如图像和视频分类和语义分割。然而,我们注意到,大多数现有的PETL方法在推理期间仍然会产生显著延迟。在本文中,我们提出了一个巨型视觉模型适配器,称为RepAdapter,它具有较高的参数效率,并且计算密集型。具体来说,我们证明了,即使具有复杂的结构,适配模块可以通过结构重新参数化无缝地融入大多数巨型视觉模型中。这种特性使得RepAdapter在推理期间零成本。除了计算效率,RepAdapter比现有的PETL方法更有效且更轻量级,因为它的稀疏结构和我们的仔细部署。为了验证RepAdapter,我们针对三个视觉任务27个基准数据集进行了广泛的实验,即图像和视频分类和语义分割。实验结果表明,RepAdapter的性能和效率优于最先进的PETL方法。例如,仅更新0.6%的参数,可以将ViT的性能从38.8改善到55.1,其通用性也得到了ViT、CLIP、 Swin-Transformer和ConvNeXt等许多视觉模型的充分验证。我们源代码库在此httpsURL上发布。



3D Action Action_Localization Action_Recognition Activity Adversarial Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot