Abstract
Existing studies for gait recognition primarily utilized sequences of either binary silhouette or human parsing to encode the shapes and dynamics of persons during walking. Silhouettes exhibit accurate segmentation quality and robustness to environmental variations, but their low information entropy may result in sub-optimal performance. In contrast, human parsing provides fine-grained part segmentation with higher information entropy, but the segmentation quality may deteriorate due to the complex environments. To discover the advantages of silhouette and parsing and overcome their limitations, this paper proposes a novel cross-granularity alignment gait recognition method, named XGait, to unleash the power of gait representations of different granularity. To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces, respectively. Moreover, to explore the complementary knowledge across the features of two representations, we design the Global Cross-granularity Module (GCM) and the Part Cross-granularity Module (PCM) after the two encoders. In particular, the GCM aims to enhance the quality of parsing features by leveraging global features from silhouettes, while the PCM aligns the dynamics of human parts between silhouette and parsing features using the high information entropy in parsing sequences. In addition, to effectively guide the alignment of two representations with different granularity at the part level, an elaborate-designed learnable division mechanism is proposed for the parsing features. Comprehensive experiments on two large-scale gait datasets not only show the superior performance of XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG but also reflect the robustness of the learned features even under challenging conditions like occlusions and cloth changes.
Abstract (translated)
现有的步态识别研究主要利用二值轮廓序列或人体分割序列来编码行走过程中的人物形状和动态。轮廓表现出准确的分割质量和对环境变化的强大鲁棒性,但其低信息熵可能导致性能不佳。相比之下,人体解析提供了更高信息熵的细粒度部分分割,但由于复杂环境的影响,分割质量可能会下降。为了发现轮廓和解析的优势并克服它们的局限性,本文提出了一种新颖的跨粒度步态识别方法,名为XGait,以释放不同粒度下的步态表示力。为实现这一目标,XGait首先包含两个骨干编码器分支,分别将轮廓序列和解析序列映射到两个潜在空间中。此外,为了探索两种表示特征之间的互补知识,在两个编码器之后设计了全局跨粒度模块(GCM)和部分跨粒度模块(PCM)。特别是,GCM旨在通过利用来自轮廓的全局特征来增强解析特征的质量,而PCM则使用解析序列中的高信息熵对轮廓与解析特征之间的人体部位动态进行对齐。此外,为了在部分级别上有效地指导两种不同粒度表示的对齐,提出了一个精心设计的学习分割机制用于解析特征。在两个大规模步态数据集上的综合实验不仅展示了XGait以80.5%的Rank-1准确率在Gait3D和88.3%CCPG中的优越性能,而且还反映了学习到的特征即使在遮挡和衣物变化等具有挑战性的条件下也具备鲁棒性。
URL
https://arxiv.org/abs/2411.10742