Abstract
Gait, an unobtrusive biometric, is valued for its capability to identify individuals at a distance, across external outfits and environmental conditions. This study challenges the prevailing assumption that vision-based gait recognition, in particular skeleton-based gait recognition, relies primarily on motion patterns, revealing a significant role of the implicit anthropometric information encoded in the walking sequence. We show through a comparative analysis that removing height information leads to notable performance degradation across three models and two benchmarks (CASIA-B and GREW). Furthermore, we propose a spatial transformer model processing individual poses, disregarding any temporal information, which achieves unreasonably good accuracy, emphasizing the bias towards appearance information and indicating spurious correlations in existing benchmarks. These findings underscore the need for a nuanced understanding of the interplay between motion and appearance in vision-based gait recognition, prompting a reevaluation of the methodological assumptions in this field. Our experiments indicate that "in-the-wild" datasets are less prone to spurious correlations, prompting the need for more diverse and large scale datasets for advancing the field.
Abstract (translated)
翻译:Gait,一种不显眼的生物识别技术,因其能够在距离、外部服装和环境条件下识别个体的能力而受到重视。这项研究挑战了普遍认为,视觉为基础的步态识别,特别是基于骨骼的步态识别,主要依赖于运动模式,揭示了在步行序列中编码的隐含人体测量信息的重要作用。我们通过比较分析展示了,去除身高信息会导致三种模型和两个基准(CASIA-B和GREW)的性能显著下降。此外,我们提出了一个忽略任何时间信息的空间转换器模型来处理个人动作,实现了前所未有的准确性,强调了面向外观信息的偏差,并指出了现有基准中的伪相关关系。这些发现强调了在视觉为基础的步态识别中,需要对运动和外观之间的相互作用进行深入的理解,这促使我们在该领域重新评估方法论假设。我们的实验表明,“野外”数据集不太容易受到伪相关关系的影响,因此需要更大、更多样化的数据集来推动该领域的进步。
URL
https://arxiv.org/abs/2402.08320