Paper Reading AI Learner

PMCE: Probabilistic Multi-Granularity Semantics with Caption-Guided Enhancement for Few-Shot Learning

2026-01-20 16:06:23
Jiaying Wu, Can Gao, Jinglu Hu, Hui Li, Xiaofeng Cao, Jingcai Guo

Abstract

Few-shot learning aims to identify novel categories from only a handful of labeled samples, where prototypes estimated from scarce data are often biased and generalize poorly. Semantic-based methods alleviate this by introducing coarse class-level information, but they are mostly applied on the support side, leaving query representations unchanged. In this paper, we present PMCE, a Probabilistic few-shot framework that leverages Multi-granularity semantics with Caption-guided Enhancement. PMCE constructs a nonparametric knowledge bank that stores visual statistics for each category as well as CLIP-encoded class name embeddings of the base classes. At meta-test time, the most relevant base classes are retrieved based on the similarities of class name embeddings for each novel category. These statistics are then aggregated into category-specific prior information and fused with the support set prototypes via a simple MAP update. Simultaneously, a frozen BLIP captioner provides label-free instance-level image descriptions, and a lightweight enhancer trained on base classes optimizes both support prototypes and query features under an inductive protocol with a consistency regularization to stabilize noisy captions. Experiments on four benchmarks show that PMCE consistently improves over strong baselines, achieving up to 7.71% absolute gain over the strongest semantic competitor on MiniImageNet in the 1-shot setting. Our code is available at this https URL

Abstract (translated)

少量样本学习(Few-shot learning)旨在仅通过少数标记样本识别新类别,但由于数据稀缺,从这些有限的数据中估计出的原型常常存在偏差,并且泛化性能较差。基于语义的方法通过引入粗粒度的类级别信息来缓解这一问题,但它们主要应用于支持集方面,而未改变查询表示。 在本文中,我们提出了PMCE(Probabilistic few-shot framework with Multi-granularity semantics and Caption-guided Enhancement),这是一个利用多粒度语义和基于描述指导增强的概率少量样本学习框架。PMCE构建了一个非参数知识库,该知识库存储了每个类别的视觉统计信息以及基础类别中的CLIP编码的类别名称嵌入。在元测试阶段,根据新类别的类别名称嵌入相似性检索最相关的基础类别。随后,将这些统计数据聚合为特定于类别的先验信息,并通过简单的MAP更新与支持集原型融合。 同时,一个冻结的BLIP描述器提供无标签的实例级图像描述,而基于基础类训练的一个轻量级增强器在归纳协议下优化了支持原型和查询特征,并使用一致性正则化来稳定噪声描述。实验结果表明,在四个基准数据集上,PMCE相对于强大的基线方法持续改进,在MiniImageNet的一次性设置中相较于最强的语义竞争者实现了高达7.71%的绝对收益。我们的代码可在上述链接获取。 这段翻译解释了PMCE框架如何通过结合多粒度语义信息和基于描述的增强,来提高少量样本学习中的性能,并详细介绍了该方法的工作原理及其在几个基准数据集上的实验效果。

URL

https://arxiv.org/abs/2601.14111

PDF

https://arxiv.org/pdf/2601.14111.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot