Abstract
Few-shot learning aims to identify novel categories from only a handful of labeled samples, where prototypes estimated from scarce data are often biased and generalize poorly. Semantic-based methods alleviate this by introducing coarse class-level information, but they are mostly applied on the support side, leaving query representations unchanged. In this paper, we present PMCE, a Probabilistic few-shot framework that leverages Multi-granularity semantics with Caption-guided Enhancement. PMCE constructs a nonparametric knowledge bank that stores visual statistics for each category as well as CLIP-encoded class name embeddings of the base classes. At meta-test time, the most relevant base classes are retrieved based on the similarities of class name embeddings for each novel category. These statistics are then aggregated into category-specific prior information and fused with the support set prototypes via a simple MAP update. Simultaneously, a frozen BLIP captioner provides label-free instance-level image descriptions, and a lightweight enhancer trained on base classes optimizes both support prototypes and query features under an inductive protocol with a consistency regularization to stabilize noisy captions. Experiments on four benchmarks show that PMCE consistently improves over strong baselines, achieving up to 7.71% absolute gain over the strongest semantic competitor on MiniImageNet in the 1-shot setting. Our code is available at this https URL
Abstract (translated)
少量样本学习(Few-shot learning)旨在仅通过少数标记样本识别新类别,但由于数据稀缺,从这些有限的数据中估计出的原型常常存在偏差,并且泛化性能较差。基于语义的方法通过引入粗粒度的类级别信息来缓解这一问题,但它们主要应用于支持集方面,而未改变查询表示。 在本文中,我们提出了PMCE(Probabilistic few-shot framework with Multi-granularity semantics and Caption-guided Enhancement),这是一个利用多粒度语义和基于描述指导增强的概率少量样本学习框架。PMCE构建了一个非参数知识库,该知识库存储了每个类别的视觉统计信息以及基础类别中的CLIP编码的类别名称嵌入。在元测试阶段,根据新类别的类别名称嵌入相似性检索最相关的基础类别。随后,将这些统计数据聚合为特定于类别的先验信息,并通过简单的MAP更新与支持集原型融合。 同时,一个冻结的BLIP描述器提供无标签的实例级图像描述,而基于基础类训练的一个轻量级增强器在归纳协议下优化了支持原型和查询特征,并使用一致性正则化来稳定噪声描述。实验结果表明,在四个基准数据集上,PMCE相对于强大的基线方法持续改进,在MiniImageNet的一次性设置中相较于最强的语义竞争者实现了高达7.71%的绝对收益。我们的代码可在上述链接获取。 这段翻译解释了PMCE框架如何通过结合多粒度语义信息和基于描述的增强,来提高少量样本学习中的性能,并详细介绍了该方法的工作原理及其在几个基准数据集上的实验效果。
URL
https://arxiv.org/abs/2601.14111