Paper Reading AI Learner

Few-Shot Object Detection via Variational Feature Aggregation

2023-01-31 04:58:21
Jiaming Han, Yuqiang Ren, Jian Ding, Ke Yan, Gui-Song Xia

Abstract

As few-shot object detectors are often trained with abundant base samples and fine-tuned on few-shot novel examples,the learned models are usually biased to base classes and sensitive to the variance of novel examples. To address this issue, we propose a meta-learning framework with two novel feature aggregation schemes. More precisely, we first present a Class-Agnostic Aggregation (CAA) method, where the query and support features can be aggregated regardless of their categories. The interactions between different classes encourage class-agnostic representations and reduce confusion between base and novel classes. Based on the CAA, we then propose a Variational Feature Aggregation (VFA) method, which encodes support examples into class-level support features for robust feature aggregation. We use a variational autoencoder to estimate class distributions and sample variational features from distributions that are more robust to the variance of support examples. Besides, we decouple classification and regression tasks so that VFA is performed on the classification branch without affecting object localization. Extensive experiments on PASCAL VOC and COCO demonstrate that our method significantly outperforms a strong baseline (up to 16\%) and previous state-of-the-art methods (4\% in average). Code will be available at: \url{this https URL}

Abstract (translated)

由于 few-shot 对象检测器通常使用大量的基础样本进行训练,并针对 few-shot 新样本进行微调,因此,训练模型通常会倾向于基础类别,并对新样本的变异敏感。为了解决这一问题,我们提出了一种基于两个新的特征聚合方案的meta-learning框架。具体来说,我们首先提出了一种无类别聚合方法(CAA),该方法无论查询和支持特征的类别如何都可以进行聚合。不同类别之间的交互鼓励无类别表示,并减少基础和新类别之间的混淆。基于CAA,我们随后提出了一种变分特征聚合方法(VFA),该方法将支持例子编码为类级别的支持特征,以进行稳健的特征聚合。我们使用变分自编码器来估计类分布,并从分布中更鲁棒地采样变分特征。此外,我们分离了分类和回归任务,从而使 VFA 在分类分支上运行,而不会影响物体定位。在PASCAL VOC和COCO等任务的实验表明,我们的方法 significantly outperforms a strong baseline (up to 16\%) and previous state-of-the-art methods (average 4\%)。代码将位于 url{this https URL}。

URL

https://arxiv.org/abs/2301.13411

PDF

https://arxiv.org/pdf/2301.13411.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot