Paper Reading AI Learner

Pedestrian re-identification based on Tree branch network with local and global learning

2019-03-31 07:51:08
Hui Li, Meng Yang, Zhihui Lai, Weishi Zheng, Zitong Yu

Abstract

Deep part-based methods in recent literature have revealed the great potential of learning local part-level representation for pedestrian image in the task of person re-identification. However, global features that capture discriminative holistic information of human body are usually ignored or not well exploited. This motivates us to investigate joint learning global and local features from pedestrian images. Specifically, in this work, we propose a novel framework termed tree branch network (TBN) for person re-identification. Given a pedestrain image, the feature maps generated by the backbone CNN, are partitioned recursively into several pieces, each of which is followed by a bottleneck structure that learns finer-grained features for each level in the hierarchical tree-like framework. In this way, representations are learned in a coarse-to-fine manner and finally assembled to produce more discriminative image descriptions. Experimental results demonstrate the effectiveness of the global and local feature learning method in the proposed TBN framework. We also show significant improvement in performance over state-of-the-art methods on three public benchmarks: Market-1501, CUHK-03 and DukeMTMC.

Abstract (translated)

近年来文献中的深层次局部表示方法揭示了在人的再识别任务中学习行人形象局部级表示的巨大潜力。然而,捕捉人体识别性整体信息的全球特征往往被忽视或没有很好地利用。这促使我们从行人图像中研究联合学习的全球和本地特征。具体地说,在这项工作中,我们提出了一个新的框架,称为树分支网络(tbn)的人重新识别。对于一个步行街图像,主干CNN生成的特征图被递归地分为多个部分,每个部分后面都有一个瓶颈结构,在层次树型框架中为每个级别学习更细粒度的特征。通过这种方式,以粗到细的方式学习表示,并最终组装以产生更具辨别力的图像描述。实验结果表明,全局和局部特征学习方法在该框架下的有效性。我们还显示,在三个公共基准(market-1501、cuhk-03和dukemtmc)上,与最先进的方法相比,性能有了显著改善。

URL

https://arxiv.org/abs/1904.00355

PDF

https://arxiv.org/pdf/1904.00355.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot