Paper Reading AI Learner

Few Shot Class Incremental Learning using Vision-Language models

2024-05-02 06:52:49
Anurag Kumar, Chinmay Bharti, Saikat Dutta, Srikrishna Karanam, Biplab Banerjee

Abstract

Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The challenge emerges in seamlessly integrating new classes with few samples into the training data, demanding the model to adeptly accommodate these additions without compromising its performance on base classes. To address this exigency, the research community has introduced several solutions under the realm of few-shot class incremental learning (FSCIL). In this study, we introduce an innovative FSCIL framework that utilizes language regularizer and subspace regularizer. During base training, the language regularizer helps incorporate semantic information extracted from a Vision-Language model. The subspace regularizer helps in facilitating the model's acquisition of nuanced connections between image and text semantics inherent to base classes during incremental training. Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes. To substantiate the efficacy of our approach, we conduct comprehensive experiments on three distinct FSCIL benchmarks, where our framework attains state-of-the-art performance.

Abstract (translated)

近年来在深度学习方面的进步在各种监督计算机视觉任务中展示了与人类能力相当的表现。然而,在模型训练之前假设拥有一个包含所有类别的广泛训练数据集通常与现实世界场景相悖,因为在 novel 类别的数据可用性受限的情况下,这种假设往往会导致模型的性能下降。挑战在于将新类别的样本无缝地集成到训练数据中,要求模型在不影响基础类别的性能的情况下适应这些添加。为解决这一紧迫情况,研究社区已经提出了一些解决方案,属于少样本分类增量学习(FSCIL)领域。在这项研究中,我们引入了一种创新的 FSCIL 框架,该框架利用语言正则化和子空间正则化。在基础训练期间,语言正则化有助于将 Vision-Language 模型中提取的语义信息整合到模型中。子空间正则化有助于在增量训练期间促进模型从基础类别的图像和文本语义中获取细微的连接。我们提出的框架不仅使模型能够拥抱数据有限的新类别,而且还确保了基础类别的性能不受影响。为了验证我们方法的效力,我们在三个不同的 FSCIL 基准上进行了全面的实验,我们的框架在这些基准上取得了最先进的性能。

URL

https://arxiv.org/abs/2405.01040

PDF

https://arxiv.org/pdf/2405.01040.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot