Abstract
Video-based person re-identification deals with the inherent difficulty of matching unregulated sequences with different length and with incomplete target pose/viewpoint structure. Common approaches operate either by reducing the problem to the still images case, facing a significant information loss, or by exploiting inter-sequence temporal dependencies as in Siamese Recurrent Neural Networks or in gait analysis. However, in all cases, the inter-sequences pose/viewpoint misalignment is not considered, and the existing spatial approaches are mostly limited to the still images context. To this end, we propose a novel approach that can exploit more effectively the rich video information, by accounting for the role that the changing pose/viewpoint factor plays in the sequences matching process. Specifically, our approach consists of two components. The first one attempts to complement the original pose-incomplete information carried by the sequences with synthetic GAN-generated images, and fuse their feature vectors into a more discriminative viewpoint-insensitive embedding, namely Weighted Fusion (WF). Another one performs an explicit pose-based alignment of sequence pairs to promote coherent feature matching, namely Weighted-Pose Regulation (WPR). Extensive experiments on two large video-based benchmark datasets show that our approach outperforms considerably existing methods.
Abstract (translated)
基于视频的人的再识别处理了不同长度、目标姿态/视点结构不完整的无规则序列匹配的固有困难。常用的方法要么将问题减少到静止图像的情况下,面临重大的信息丢失,要么利用序列间时间依赖性,如暹罗复发神经网络或步态分析。然而,在所有情况下,都不考虑序列间姿态/视点偏差,现有的空间方法大多局限于静止图像背景。为此,我们提出了一种新的方法,可以更有效地利用丰富的视频信息,通过考虑变化的姿态/视点因素在序列匹配过程中的作用。具体来说,我们的方法由两个部分组成。第一种方法是将序列携带的原始姿势不完全信息与合成的GaN生成的图像进行互补,并将其特征向量融合到更具识别性的视点不敏感嵌入中,即加权融合(WF)。另一种方法是对序列对进行基于姿态的显式对齐,以促进一致的特征匹配,即加权姿态调节(WPR)。对两个基于视频的大型基准数据集进行的大量实验表明,我们的方法比现有的方法要出色得多。
URL
https://arxiv.org/abs/1903.11552