Paper Reading AI Learner

Person Search in Videos with One Portrait Through Visual and Temporal Links

2018-07-27 09:39:28
Qingqiu Huang, Wentao Liu, Dahua Lin

Abstract

In real-world applications, e.g. law enforcement and video retrieval, one often needs to search a certain person in long videos with just one portrait. This is much more challenging than the conventional settings for person re-identification, as the search may need to be carried out in the environments different from where the portrait was taken. In this paper, we aim to tackle this challenge and propose a novel framework, which takes into account the identity invariance along a tracklet, thus allowing person identities to be propagated via both the visual and the temporal links. We also develop a novel scheme called Progressive Propagation via Competitive Consensus, which significantly improves the reliability of the propagation process. To promote the study of person search, we construct a large-scale benchmark, which contains 127K manually annotated tracklets from 192 movies. Experiments show that our approach remarkably outperforms mainstream person re-id methods, raising the mAP from 42.16% to 62.27%.

Abstract (translated)

在实际应用中,例如执法和视频检索,人们经常需要只用一个肖像在长视频中搜索某个人。这比人们重新识别的传统设置更具挑战性,因为搜索可能需要在与拍摄肖像的地方不同的环境中进行。在本文中,我们的目标是解决这一挑战并提出一个新的框架,该框架考虑了沿轨迹的身份不变性,从而允许通过视觉和时间链接传播人物身份。我们还开发了一种名为“通过竞争共识实现渐进式传播”的新方案,该方案显着提高了传播过程的可靠性。为了促进人物搜索的研究,我们构建了一个大型基准测试,其中包含来自192部电影的127K手动注释轨迹。实验表明,我们的方法明显优于主流人员re-id方法,将mAP从42.16%提高到62.27%。

URL

https://arxiv.org/abs/1807.10510

PDF

https://arxiv.org/pdf/1807.10510.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot