Paper Reading AI Learner

Conditional Prototype Rectification Prompt Learning

2024-04-15 15:43:52
Haoxing Chen, Yaohui Li, Zizheng Huang, Yan Hong, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang

Abstract

Pre-trained large-scale vision-language models (VLMs) have acquired profound understanding of general visual concepts. Recent advancements in efficient transfer learning (ETL) have shown remarkable success in fine-tuning VLMs within the scenario of limited data, introducing only a few parameters to harness task-specific insights from VLMs. Despite significant progress, current leading ETL methods tend to overfit the narrow distributions of base classes seen during training and encounter two primary challenges: (i) only utilizing uni-modal information to modeling task-specific knowledge; and (ii) using costly and time-consuming methods to supplement knowledge. To address these issues, we propose a Conditional Prototype Rectification Prompt Learning (CPR) method to correct the bias of base examples and augment limited data in an effective way. Specifically, we alleviate overfitting on base classes from two aspects. First, each input image acquires knowledge from both textual and visual prototypes, and then generates sample-conditional text tokens. Second, we extract utilizable knowledge from unlabeled data to further refine the prototypes. These two strategies mitigate biases stemming from base classes, yielding a more effective classifier. Extensive experiments on 11 benchmark datasets show that our CPR achieves state-of-the-art performance on both few-shot classification and base-to-new generalization tasks. Our code is avaliable at \url{this https URL}.

Abstract (translated)

预训练的大型视觉语言模型(VLMs)已经对一般视觉概念取得了深刻的理解。最近在高效迁移学习(ETL)方面的进步表明,在有限数据的情况下对VLMs进行微调取得了显著的成功,只引入了几个参数就有效地利用了VLMs的任务特定知识。尽管取得了显著的进展,但目前的领导ETL方法往往过于关注训练期间的狭窄基类分布,并遇到了两个主要挑战:(i)仅利用单模态信息来建模任务特定知识;(ii)使用昂贵且耗时费力的方法来补充知识。为了应对这些问题,我们提出了一个有条件的原型矩形化提示学习(CPR)方法来纠正基类的偏差并有效地增加有限数据。具体来说,我们通过两个方面减轻了基类的过拟合:首先,每个输入图像从文本和视觉原型中获取知识,然后生成条件文本标记;其次,我们从未标记的数据中提取有用的知识,进一步优化原型。这两种策略减轻了基类的偏差,产生了更有效的分类器。在11个基准数据集上的大量实验证明,我们的CPR在几 shot分类和基点对新通用分类任务上取得了最先进的性能。我们的代码可在此处访问:https://this URL。

URL

https://arxiv.org/abs/2404.09872

PDF

https://arxiv.org/pdf/2404.09872.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot