Paper Reading AI Learner

Weakly Supervised Person Re-Identification

2019-04-08 04:21:54
Jingke Meng, Sheng Wu, Wei-Shi Zheng

Abstract

In the conventional person re-id setting, it is assumed that the labeled images are the person images within the bounding box for each individual; this labeling across multiple nonoverlapping camera views from raw video surveillance is costly and time-consuming. To overcome this difficulty, we consider weakly supervised person re-id modeling. The weak setting refers to matching a target person with an untrimmed gallery video where we only know that the identity appears in the video without the requirement of annotating the identity in any frame of the video during the training procedure. Hence, for a video, there could be multiple video-level labels. We cast this weakly supervised person re-id challenge into a multi-instance multi-label learning (MIML) problem. In particular, we develop a Cross-View MIML (CV-MIML) method that is able to explore potential intraclass person images from all the camera views by incorporating the intra-bag alignment and the cross-view bag alignment. Finally, the CV-MIML method is embedded into an existing deep neural network for developing the Deep Cross-View MIML (Deep CV-MIML) model. We have performed extensive experiments to show the feasibility of the proposed weakly supervised setting and verify the effectiveness of our method compared to related methods on four weakly labeled datasets.

Abstract (translated)

在传统的个人识别设置中,假设标记的图像是每个人的边界框内的个人图像;原始视频监控的多个不重叠摄像头视图之间的标记成本高且耗时。为了克服这一困难,我们考虑了弱监督的人重新ID建模。弱设置是指将目标人物与未经剪辑的画廊视频进行匹配,我们只知道该身份出现在视频中,而不需要在培训过程中在视频的任何帧中注释该身份。因此,对于视频,可能有多个视频级别标签。我们将这个弱监督的人重新识别问题转化为一个多实例多标签学习(MIML)问题。特别是,我们开发了一种交叉视场MIML(CV-MIML)方法,通过结合袋内对齐和交叉视场袋对齐,可以从所有摄像机视图中探索潜在的类内人图像。最后,将CV-MIML方法嵌入到现有的深部神经网络中,用于建立深部横观MIML模型。我们进行了大量的实验,证明了所提出的弱监督设置的可行性,并验证了该方法在四个弱标记数据集上与相关方法相比的有效性。

URL

https://arxiv.org/abs/1904.03832

PDF

https://arxiv.org/pdf/1904.03832.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot