Abstract
Video-based person re-id has drawn much attention in recent years due to its prospective applications in video surveillance. Most existing methods concentrate on how to represent discriminative clip-level features. Moreover, clip-level data augmentation is also important, especially for temporal aggregation task. Inconsistent intra-clip augmentation will collapse inter-frame alignment, thus bringing in additional noise. To tackle the above-motioned problems, we design a novel framework for video-based person re-id, which consists of two main modules: Synchronized Transformation (ST) and Intra-clip Aggregation (ICA). The former module augments intra-clip frames with the same probability and the same operation, while the latter leverages two-level intra-clip encoding to generate more discriminative clip-level features. To confirm the advantage of synchronized transformation, we conduct ablation study with different synchronized transformation scheme. We also perform cross-dataset experiment to better understand the generality of our method. Extensive experiments on three benchmark datasets demonstrate that our framework outperforming the most of recent state-of-the-art methods.
Abstract (translated)
基于视频的人脸识别技术在视频监控中具有广阔的应用前景,近年来引起了人们的广泛关注。现有的方法大多集中在如何表示具有识别性的剪辑级特征上。此外,剪辑级的数据扩充也很重要,特别是对于时间聚合任务。不一致的帧内增强将导致帧间对齐崩溃,从而带来额外的噪声。为了解决上述问题,我们设计了一个基于视频的人识别框架,该框架由两个主要模块组成:同步转换(ST)和帧内聚合(ICA)。前一个模块以相同的概率和相同的操作增加了帧内剪辑,而后一个模块利用两级帧内剪辑编码来生成更具辨别力的剪辑级特征。为了确定同步变换的优点,我们采用不同的同步变换方案进行了烧蚀研究。我们还进行了交叉数据集实验,以更好地理解我们方法的一般性。对三个基准数据集进行的大量实验表明,我们的框架优于最新最先进的方法。
URL
https://arxiv.org/abs/1905.01722