Class-relevant Patch Embedding Selection for Few-Shot Image Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

Effective image classification hinges on discerning relevant features from both foreground and background elements, with the foreground typically holding the critical information. While humans adeptly classify images with limited exposure, artificial neural networks often struggle with feature selection from rare samples. To address this challenge, we propose a novel method for selecting class-relevant patch embeddings. Our approach involves splitting support and query images into patches, encoding them using a pre-trained Vision Transformer (ViT) to obtain class embeddings and patch embeddings, respectively. Subsequently, we filter patch embeddings using class embeddings to retain only the class-relevant ones. For each image, we calculate the similarity between class embedding and each patch embedding, sort the similarity sequence in descending order, and only retain top-ranked patch embeddings. By prioritizing similarity between the class embedding and patch embeddings, we select top-ranked patch embeddings to be fused with class embedding to form a comprehensive image representation, enhancing pattern recognition across instances. Our strategy effectively mitigates the impact of class-irrelevant patch embeddings, yielding improved performance in pre-trained models. Extensive experiments on popular few-shot classification benchmarks demonstrate the simplicity, efficacy, and computational efficiency of our approach, outperforming state-of-the-art baselines under both 5-shot and 1-shot scenarios.

Abstract (translated)

有效的图像分类依赖于从前景和背景元素中辨别相关特征，通常前景持有关键信息。虽然人类在有限曝光下也能够分类图像，但人工神经网络通常在从罕见样本中选择特征时遇到困难。为了应对这个挑战，我们提出了一种选择类相关补丁嵌入的新方法。我们的方法将支持性和查询图像分割成补丁，并使用预训练的Vision Transformer（ViT）对其进行编码，分别获得类嵌入和补丁嵌入。接下来，我们使用类嵌入过滤补丁嵌入，保留只有类相关的补丁。对于每个图像，我们计算类嵌入与每个补丁嵌入之间的相似度，将相似度序列按下降顺序排序，并仅保留排名靠前的补丁嵌入。通过优先考虑类嵌入与补丁嵌入之间的相似性，我们选择排名靠前的补丁嵌入与类嵌入融合，形成全面图像表示，增强模式识别。通过有效地减轻类无关补丁嵌入的影响，我们的策略在预训练模型上产生了改进。在流行的小样本分类基准上进行广泛的实验，证明了我们的方法的简单性、有效性和计算效率，在5-shot和1-shot场景下均优于最先进的基线。

URL

https://arxiv.org/abs/2405.03722

PDF

https://arxiv.org/pdf/2405.03722.pdf

Class-relevant Patch Embedding Selection for Few-Shot Image Classification

Abstract

Abstract (translated)

URL

PDF Copy

PDF