Paper Reading AI Learner

LEAPS: End-to-End One-Step Person Search With Learnable Proposals

2023-03-21 13:59:32
Zhiqiang Dong, Jiale Cao, Rao Muhammad Anwer, Jin Xie, Fahad Khan, Yanwei Pang

Abstract

We propose an end-to-end one-step person search approach with learnable proposals, named LEAPS. Given a set of sparse and learnable proposals, LEAPS employs a dynamic person search head to directly perform person detection and corresponding re-id feature generation without non-maximum suppression post-processing. The dynamic person search head comprises a detection head and a novel flexible re-id head. Our flexible re-id head first employs a dynamic region-of-interest (RoI) operation to extract discriminative RoI features of the proposals. Then, it generates re-id features using a plain and a hierarchical interaction re-id module. To better guide discriminative re-id feature learning, we introduce a diverse re-id sample matching strategy, instead of bipartite matching in detection head. Comprehensive experiments reveal the benefit of the proposed LEAPS, achieving a favorable performance on two public person search benchmarks: CUHK-SYSU and PRW. When using the same ResNet50 backbone, our LEAPS obtains a mAP score of 55.0%, outperforming the best reported results in literature by 1.7%, while achieving around a two-fold speedup on the challenging PRW dataset. Our source code and models will be released.

Abstract (translated)

我们提出了一种具有可学习提案的全方位一步人搜索方法,名为LEAPS。给定一组稀疏且可学习的提案,LEAPS使用动态人搜索头直接进行人检测和对应的重写识别特征生成,而不需要非最大抑制后处理。动态人搜索头由检测头和创新的 flexible re-id头组成。我们的 flexible re-id头首先使用动态区域兴趣(RoI)操作提取提案中的区域兴趣特征。然后,它使用一个 plain 和Hierarchical interaction re-id module 生成重写识别特征。为了更好地指导重写识别特征的学习,我们引入了多种重写样本匹配策略,而不是在检测头中使用二元匹配。全面实验揭示了 proposed LEAPS 的优点,在两个公共人搜索基准上取得了有利的性能:CUHK-SYSU 和 PRW。使用相同的 ResNet50 骨干,我们的 LEAPS 获得 55.0% 的 mAP 得分,比文献中报道的最佳结果高出 1.7%,并在挑战性的PRW数据集上实现了大约两倍的速度提升。我们的源代码和模型将发布。

URL

https://arxiv.org/abs/2303.11859

PDF

https://arxiv.org/pdf/2303.11859.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot