Paper Reading AI Learner

One-to-Few Label Assignment for End-to-End Dense Detection

2023-03-21 03:24:47
Shuai Li, Minghan Li, Ruihuang Li, Chenhang He, Lei Zhang

Abstract

One-to-one (o2o) label assignment plays a key role for transformer based end-to-end detection, and it has been recently introduced in fully convolutional detectors for end-to-end dense detection. However, o2o can degrade the feature learning efficiency due to the limited number of positive samples. Though extra positive samples are introduced to mitigate this issue in recent DETRs, the computation of self- and cross- attentions in the decoder limits its practical application to dense and fully convolutional detectors. In this work, we propose a simple yet effective one-to-few (o2f) label assignment strategy for end-to-end dense detection. Apart from defining one positive and many negative anchors for each object, we define several soft anchors, which serve as positive and negative samples simultaneously. The positive and negative weights of these soft anchors are dynamically adjusted during training so that they can contribute more to ``representation learning'' in the early training stage, and contribute more to ``duplicated prediction removal'' in the later stage. The detector trained in this way can not only learn a strong feature representation but also perform end-to-end dense detection. Experiments on COCO and CrowdHuman datasets demonstrate the effectiveness of the o2f scheme. Code is available at this https URL.

Abstract (translated)

一对一(o2o)标签分配对于基于Transformer的端到端检测至关重要,最近在完全卷积检测中引入了端到端密度检测。然而,由于有限的阳性样本数量,o2o可能会降低特征学习效率。尽管最近的DeTRs引入了额外的阳性样本来缓解这个问题,但解码器中的自我和交叉注意力计算限制了它适用于密度和完全卷积检测的实际应用。在这项工作中,我们提出了一种简单但有效的一对一标签分配策略,用于端到端密度检测。除了为每个对象定义一个阳性和许多阴性Anchors外,我们定义了几个软Anchors,它们可以同时充当阳性和阴性样本。这些软Anchors的阳性和阴性权重在训练期间动态地调整,以便它们在早期的训练阶段更多地参与“表示学习”,并在后期更多地参与“重复预测删除”。通过这种方式训练的探测器不仅可以学习强大的特征表示,还可以进行端到端密度检测。COCO和人群人类数据集的实验证明了o2f方案的有效性。代码可在该httpsURL上获取。

URL

https://arxiv.org/abs/2303.11567

PDF

https://arxiv.org/pdf/2303.11567.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot