Abstract
Gait recognition holds the promise to robustly identify subjects based on walking patterns instead of appearance information. In recent years, this field has been dominated by learning methods based on two principal input representations: dense silhouette masks or sparse pose keypoints. In this work, we propose a novel, point-based Contour-Pose representation, which compactly expresses both body shape and body parts information. We further propose a local-to-global architecture, called GaitContour, to leverage this novel representation and efficiently compute subject embedding in two stages. The first stage consists of a local transformer that extracts features from five different body regions. The second stage then aggregates the regional features to estimate a global human gait representation. Such a design significantly reduces the complexity of the attention operation and improves efficiency and performance simultaneously. Through large scale experiments, GaitContour is shown to perform significantly better than previous point-based methods, while also being significantly more efficient than silhouette-based methods. On challenging datasets with significant distractors, GaitContour can even outperform silhouette-based methods.
Abstract (translated)
基于行走模式的识别具有识别基于步行模式而非外观信息的受试者的潜力。在过去的几年里,这个领域主导着基于两个主要输入表示的学习方法:密集轮廓掩码或稀疏姿态关键点。在这项工作中,我们提出了一个新颖的基于点的轮廓-姿态表示,该表示既包含了身体形状信息,又包含了身体部位信息。我们进一步提出了一个局部到全局架构,称为GaitContour,以利用这个新颖表示并高效地计算受试者嵌入。第一阶段包括从五个不同的身体区域提取特征的局部Transformer。第二阶段然后对区域特征进行聚合,以估计全局人类步态表示。这种设计显著减少了注意操作的复杂性,同时提高了效率和性能。通过大型的实验,GaitContour证明了比之前基于点的方法显著更好的性能,同时比基于轮廓的方法更高效。在具有巨大干扰者的具有挑战性的数据集中,GaitContour甚至超过了基于轮廓的方法。
URL
https://arxiv.org/abs/2311.16497