Abstract
Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The challenge emerges in seamlessly integrating new classes with few samples into the training data, demanding the model to adeptly accommodate these additions without compromising its performance on base classes. To address this exigency, the research community has introduced several solutions under the realm of few-shot class incremental learning (FSCIL). In this study, we introduce an innovative FSCIL framework that utilizes language regularizer and subspace regularizer. During base training, the language regularizer helps incorporate semantic information extracted from a Vision-Language model. The subspace regularizer helps in facilitating the model's acquisition of nuanced connections between image and text semantics inherent to base classes during incremental training. Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes. To substantiate the efficacy of our approach, we conduct comprehensive experiments on three distinct FSCIL benchmarks, where our framework attains state-of-the-art performance.
Abstract (translated)
近年来在深度学习方面的进步在各种监督计算机视觉任务中展示了与人类能力相当的表现。然而,在模型训练之前假设拥有一个包含所有类别的广泛训练数据集通常与现实世界场景相悖,因为在 novel 类别的数据可用性受限的情况下,这种假设往往会导致模型的性能下降。挑战在于将新类别的样本无缝地集成到训练数据中,要求模型在不影响基础类别的性能的情况下适应这些添加。为解决这一紧迫情况,研究社区已经提出了一些解决方案,属于少样本分类增量学习(FSCIL)领域。在这项研究中,我们引入了一种创新的 FSCIL 框架,该框架利用语言正则化和子空间正则化。在基础训练期间,语言正则化有助于将 Vision-Language 模型中提取的语义信息整合到模型中。子空间正则化有助于在增量训练期间促进模型从基础类别的图像和文本语义中获取细微的连接。我们提出的框架不仅使模型能够拥抱数据有限的新类别,而且还确保了基础类别的性能不受影响。为了验证我们方法的效力,我们在三个不同的 FSCIL 基准上进行了全面的实验,我们的框架在这些基准上取得了最先进的性能。
URL
https://arxiv.org/abs/2405.01040