Paper Reading AI Learner

GAN-based Pose-aware Regulation for Video-based Person Re-identification

2019-03-27 17:14:51
Alessandro Borgia, Yang Hua, Elyor Kodirov, Neil M. Robertson

Abstract

Video-based person re-identification deals with the inherent difficulty of matching unregulated sequences with different length and with incomplete target pose/viewpoint structure. Common approaches operate either by reducing the problem to the still images case, facing a significant information loss, or by exploiting inter-sequence temporal dependencies as in Siamese Recurrent Neural Networks or in gait analysis. However, in all cases, the inter-sequences pose/viewpoint misalignment is not considered, and the existing spatial approaches are mostly limited to the still images context. To this end, we propose a novel approach that can exploit more effectively the rich video information, by accounting for the role that the changing pose/viewpoint factor plays in the sequences matching process. Specifically, our approach consists of two components. The first one attempts to complement the original pose-incomplete information carried by the sequences with synthetic GAN-generated images, and fuse their feature vectors into a more discriminative viewpoint-insensitive embedding, namely Weighted Fusion (WF). Another one performs an explicit pose-based alignment of sequence pairs to promote coherent feature matching, namely Weighted-Pose Regulation (WPR). Extensive experiments on two large video-based benchmark datasets show that our approach outperforms considerably existing methods.

Abstract (translated)

基于视频的人的再识别处理了不同长度、目标姿态/视点结构不完整的无规则序列匹配的固有困难。常用的方法要么将问题减少到静止图像的情况下,面临重大的信息丢失,要么利用序列间时间依赖性,如暹罗复发神经网络或步态分析。然而,在所有情况下,都不考虑序列间姿态/视点偏差,现有的空间方法大多局限于静止图像背景。为此,我们提出了一种新的方法,可以更有效地利用丰富的视频信息,通过考虑变化的姿态/视点因素在序列匹配过程中的作用。具体来说,我们的方法由两个部分组成。第一种方法是将序列携带的原始姿势不完全信息与合成的GaN生成的图像进行互补,并将其特征向量融合到更具识别性的视点不敏感嵌入中,即加权融合(WF)。另一种方法是对序列对进行基于姿态的显式对齐,以促进一致的特征匹配,即加权姿态调节(WPR)。对两个基于视频的大型基准数据集进行的大量实验表明,我们的方法比现有的方法要出色得多。

URL

https://arxiv.org/abs/1903.11552

PDF

https://arxiv.org/pdf/1903.11552.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot