Paper Reading AI Learner

Class-relevant Patch Embedding Selection for Few-Shot Image Classification

2024-05-06 02:13:32
Weihao Jiang, Haoyang Cui, Kun He

Abstract

Effective image classification hinges on discerning relevant features from both foreground and background elements, with the foreground typically holding the critical information. While humans adeptly classify images with limited exposure, artificial neural networks often struggle with feature selection from rare samples. To address this challenge, we propose a novel method for selecting class-relevant patch embeddings. Our approach involves splitting support and query images into patches, encoding them using a pre-trained Vision Transformer (ViT) to obtain class embeddings and patch embeddings, respectively. Subsequently, we filter patch embeddings using class embeddings to retain only the class-relevant ones. For each image, we calculate the similarity between class embedding and each patch embedding, sort the similarity sequence in descending order, and only retain top-ranked patch embeddings. By prioritizing similarity between the class embedding and patch embeddings, we select top-ranked patch embeddings to be fused with class embedding to form a comprehensive image representation, enhancing pattern recognition across instances. Our strategy effectively mitigates the impact of class-irrelevant patch embeddings, yielding improved performance in pre-trained models. Extensive experiments on popular few-shot classification benchmarks demonstrate the simplicity, efficacy, and computational efficiency of our approach, outperforming state-of-the-art baselines under both 5-shot and 1-shot scenarios.

Abstract (translated)

有效的图像分类依赖于从前景和背景元素中辨别相关特征,通常前景持有关键信息。虽然人类在有限曝光下也能够分类图像,但人工神经网络通常在从罕见样本中选择特征时遇到困难。为了应对这个挑战,我们提出了一种选择类相关补丁嵌入的新方法。我们的方法将支持性和查询图像分割成补丁,并使用预训练的Vision Transformer(ViT)对其进行编码,分别获得类嵌入和补丁嵌入。接下来,我们使用类嵌入过滤补丁嵌入,保留只有类相关的补丁。对于每个图像,我们计算类嵌入与每个补丁嵌入之间的相似度,将相似度序列按下降顺序排序,并仅保留排名靠前的补丁嵌入。通过优先考虑类嵌入与补丁嵌入之间的相似性,我们选择排名靠前的补丁嵌入与类嵌入融合,形成全面图像表示,增强模式识别。通过有效地减轻类无关补丁嵌入的影响,我们的策略在预训练模型上产生了改进。在流行的小样本分类基准上进行广泛的实验,证明了我们的方法的简单性、有效性和计算效率,在5-shot和1-shot场景下均优于最先进的基线。

URL

https://arxiv.org/abs/2405.03722

PDF

https://arxiv.org/pdf/2405.03722.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot