Abstract
Gait recognition is a biometric technique that identifies individuals by their unique walking styles, which is suitable for unconstrained environments and has a wide range of applications. While current methods focus on exploiting body part-based representations, they often neglect the hierarchical dependencies between local motion patterns. In this paper, we propose a hierarchical spatio-temporal representation learning (HSTL) framework for extracting gait features from coarse to fine. Our framework starts with a hierarchical clustering analysis to recover multi-level body structures from the whole body to local details. Next, an adaptive region-based motion extractor (ARME) is designed to learn region-independent motion features. The proposed HSTL then stacks multiple ARMEs in a top-down manner, with each ARME corresponding to a specific partition level of the hierarchy. An adaptive spatio-temporal pooling (ASTP) module is used to capture gait features at different levels of detail to perform hierarchical feature mapping. Finally, a frame-level temporal aggregation (FTA) module is employed to reduce redundant information in gait sequences through multi-scale temporal downsampling. Extensive experiments on CASIA-B, OUMVLP, GREW, and Gait3D datasets demonstrate that our method outperforms the state-of-the-art while maintaining a reasonable balance between model accuracy and complexity.
Abstract (translated)
步态识别是一种生物特征技术,通过独特的步态风格识别个人,适用于没有限制的环境,并拥有广泛的应用程序。当前的方法主要关注利用身体部分表示,但往往忽略了 local 运动模式之间的层级依赖性。在本文中,我们提出了一种分层时间表示学习框架(HSTL),以从粗到细提取步态特征。我们的框架从分层聚类分析开始,以恢复整个身体的多层次身体结构,以及 local 细节的特定分区级别。接下来,我们设计了自适应区域运动提取器(ARME),以学习区域独立的运动特征。 proposed HSTL 然后将多个 ARME 按上式堆叠,每个 ARME 对应于层级的特定分区级别。自适应时间和空间汇聚(ASTP)模块被用于捕获不同层次的步态特征,以进行层级特征映射。最后,Frame 级别的时间聚合(FTA)模块被使用通过多尺度时间削减来减少步态序列中的冗余信息。在 CASIA-B、OUMVLP、GREW 和步态3D数据集上的广泛实验表明,我们的方法优于当前最先进的方法,同时保持模型精度和复杂性的合理平衡。
URL
https://arxiv.org/abs/2307.09856