To capture individual gait patterns, excluding identity-irrelevant cues in walking videos, such as clothing texture and color, remains a persistent challenge for vision-based gait recognition. Traditional silhouette- and pose-based methods, though theoretically effective at removing such distractions, often fall short of high accuracy due to their sparse and less informative inputs. Emerging end-to-end methods address this by directly denoising RGB videos using human priors. Building on this trend, we propose DenoisingGait, a novel gait denoising method. Inspired by the philosophy that "what I cannot create, I do not understand", we turn to generative diffusion models, uncovering how they partially filter out irrelevant factors for gait understanding. Additionally, we introduce a geometry-driven Feature Matching module, which, combined with background removal via human silhouettes, condenses the multi-channel diffusion features at each foreground pixel into a two-channel direction vector. Specifically, the proposed within- and cross-frame matching respectively capture the local vectorized structures of gait appearance and motion, producing a novel flow-like gait representation termed Gait Feature Field, which further reduces residual noise in diffusion features. Experiments on the CCPG, CASIA-B*, and SUSTech1K datasets demonstrate that DenoisingGait achieves a new SoTA performance in most cases for both within- and cross-domain evaluations. Code is available at this https URL.
要捕捉个体步态模式,排除走路视频中的身份无关线索(如衣物纹理和颜色),对于基于视觉的步态识别来说一直是一项挑战。传统的轮廓和姿势方法虽然理论上能有效去除这些干扰因素,但由于其稀疏且信息量较少的输入,在实际应用中往往难以达到高精度。新兴的端到端方法通过直接利用人体先验知识对RGB视频进行去噪来解决这一问题。在此基础上,我们提出了DenoisingGait,一种新型步态去噪方法。 受“我无法创造的东西我不理解”的哲学思想启发,我们将注意力转向生成扩散模型,并发现这些模型部分地过滤出了与步态理解无关的因素。此外,我们还引入了一个几何驱动的特征匹配模块,在通过人体轮廓移除背景后,将多通道扩散特征在每个前景像素处压缩成两个方向向量。 具体来说,提出的帧内和跨帧匹配分别捕捉了步态外观和运动的局部矢量化结构,生成了一种新的流式步态表示,称为步态特征场(Gait Feature Field),进一步减少了扩散特征中的残余噪声。在CCPG、CASIA-B* 和 SUSTech1K 数据集上的实验表明,在域内和跨域评估中,DenoisingGait 在大多数情况下实现了新的最佳性能。代码可在提供的链接中获取。
https://arxiv.org/abs/2505.18582
Large vision models (LVM) based gait recognition has achieved impressive performance. However, existing LVM-based approaches may overemphasize gait priors while neglecting the intrinsic value of LVM itself, particularly the rich, distinct representations across its multi-layers. To adequately unlock LVM's potential, this work investigates the impact of layer-wise representations on downstream recognition tasks. Our analysis reveals that LVM's intermediate layers offer complementary properties across tasks, integrating them yields an impressive improvement even without rich well-designed gait priors. Building on this insight, we propose a simple and universal baseline for LVM-based gait recognition, termed BiggerGait. Comprehensive evaluations on CCPG, CAISA-B*, SUSTech1K, and CCGR\_MINI validate the superiority of BiggerGait across both within- and cross-domain tasks, establishing it as a simple yet practical baseline for gait representation learning. All the models and code will be publicly available.
基于大型视觉模型(Large Vision Model,LVM)的步态识别已经取得了显著的成绩。然而,现有的基于LVM的方法可能过于强调步态先验知识,而忽略了LVM本身内在的价值,特别是其多层结构中的丰富且独特的表示能力。为了充分挖掘LVM的潜力,本研究探讨了逐层表示在下游任务中对步态识别的影响。我们的分析表明,LVM的中间层提供了跨任务互补的特性,即使没有丰富的精心设计的步态先验知识,将它们融合也能带来显著的改进。基于这一见解,我们提出了一种简单而通用的基础模型,用于LVM基的步态识别,并将其命名为BiggerGait。 在CCPG、CAISA-B*、SUSTech1K和CCGR_MINI等多个数据集上进行了全面评估,验证了BiggerGait在跨领域任务中的优越性,确立其为步态表示学习的一个简单而实用的基础模型。所有模型及其代码将在公开平台上提供。
https://arxiv.org/abs/2505.18132
Current exoskeleton control methods often face challenges in delivering personalized treatment. Standardized walking gaits can lead to patient discomfort or even injury. Therefore, personalized gait is essential for the effectiveness of exoskeleton robots, as it directly impacts their adaptability, comfort, and rehabilitation outcomes for individual users. To enable personalized treatment in exoskeleton-assisted therapy and related applications, accurate recognition of personal gait is crucial for implementing tailored gait control. The key challenge in gait recognition lies in effectively capturing individual differences in subtle gait features caused by joint synergy, such as step frequency and step length. To tackle this issue, we propose a novel approach, which uses Multi-Scale Global Dense Graph Convolutional Networks (GCN) in the spatial domain to identify latent joint synergy patterns. Moreover, we propose a Gait Non-linear Periodic Dynamics Learning module to effectively capture the periodic characteristics of gait in the temporal domain. To support our individual gait recognition task, we have constructed a comprehensive gait dataset that ensures both completeness and reliability. Our experimental results demonstrate that our method achieves an impressive accuracy of 94.34% on this dataset, surpassing the current state-of-the-art (SOTA) by 3.77%. This advancement underscores the potential of our approach to enhance personalized gait control in exoskeleton-assisted therapy.
当前的外骨骼控制系统经常面临提供个性化治疗的挑战。标准化的行走步态可能会导致患者不适甚至受伤。因此,个性化的步态对于外骨骼机器人的有效性至关重要,因为它直接影响其对个体用户的适应性、舒适性和康复效果。为了在外骨骼辅助疗法及相关应用中实现个性化治疗,准确识别个人步态对于实施定制化步态控制至关重要。 在步态识别的关键挑战之一在于有效捕捉由关节协同作用引起的细微步态特征的个体差异,如步频和步长。为了解决这一问题,我们提出了一种新颖的方法,该方法使用多尺度全局密集图卷积网络(GCN)来识别潜在的关节协同模式,在空间领域中操作。此外,我们还提出了一个步态非线性周期动力学学习模块,以有效捕捉时间域内步态的周期特征。 为了支持我们的个体步态识别任务,我们构建了一个全面的步态数据集,确保其完整性和可靠性。实验结果显示,我们的方法在这个数据集中达到了94.34%的准确率,比当前最先进的(SOTA)技术高出3.77%,这一进步表明了我们的方法在外骨骼辅助治疗中提升个性化步态控制潜力的巨大前景。
https://arxiv.org/abs/2505.18018
Generalized gait recognition, which aims to achieve robust performance across diverse domains, remains a challenging problem due to severe domain shifts in viewpoints, appearances, and environments. While mixed-dataset training is widely used to enhance generalization, it introduces new obstacles including inter-dataset optimization conflicts and redundant or noisy samples, both of which hinder effective representation learning. To address these challenges, we propose a unified framework that systematically improves cross-domain gait recognition. First, we design a disentangled triplet loss that isolates supervision signals across datasets, mitigating gradient conflicts during optimization. Second, we introduce a targeted dataset distillation strategy that filters out the least informative 20\% of training samples based on feature redundancy and prediction uncertainty, enhancing data efficiency. Extensive experiments on CASIA-B, OU-MVLP, Gait3D, and GREW demonstrate that our method significantly improves cross-dataset recognition for both GaitBase and DeepGaitV2 backbones, without sacrificing source-domain accuracy. Code will be released at this https URL.
通用步态识别旨在实现跨不同领域的稳健性能,但由于视角、外观和环境的严重领域偏移,这仍然是一个具有挑战性的问题。虽然混合数据集训练广泛用于增强泛化能力,但它引入了新的障碍,包括数据集间的优化冲突以及冗余或噪声样本,这些都阻碍了有效的表示学习。为了解决这些问题,我们提出了一种统一框架,系统地改进跨域步态识别性能。 首先,我们设计了一个解耦三元组损失函数,该函数将不同数据集之间的监督信号隔离开来,从而在优化过程中减轻梯度冲突。其次,我们引入了针对性的数据集蒸馏策略,根据特征冗余和预测不确定性筛选出训练样本中信息量最少的20%,以提高数据效率。 在CASIA-B、OU-MVLP、Gait3D和GREW上的大量实验表明,我们的方法显著提高了基于GaitBase和DeepGaitV2骨干网络的跨数据集识别性能,并且没有牺牲源域的准确性。代码将在此 [URL] 发布。
https://arxiv.org/abs/2505.15176
Gait recognition, known for its ability to identify individuals from a distance, has gained significant attention in recent times due to its non-intrusive verification. While video-based gait identification systems perform well on large public datasets, their performance drops when applied to real-world, unconstrained gait data due to various factors. Among these, uncontrolled outdoor environments, non-overlapping camera views, varying illumination, and computational efficiency are core challenges in gait-based authentication. Currently, no dataset addresses all these challenges simultaneously. In this paper, we propose an OptiGait-LGBM model capable of recognizing person re-identification under these constraints using a skeletal model approach, which helps mitigate inconsistencies in a person's appearance. The model constructs a dataset from landmark positions, minimizing memory usage by using non-sequential data. A benchmark dataset, RUET-GAIT, is introduced to represent uncontrolled gait sequences in complex outdoor environments. The process involves extracting skeletal joint landmarks, generating numerical datasets, and developing an OptiGait-LGBM gait classification model. Our aim is to address the aforementioned challenges with minimal computational cost compared to existing methods. A comparative analysis with ensemble techniques such as Random Forest and CatBoost demonstrates that the proposed approach outperforms them in terms of accuracy, memory usage, and training time. This method provides a novel, low-cost, and memory-efficient video-based gait recognition solution for real-world scenarios.
步态识别因其能够从远处识别个人而不侵扰的特点,在近期引起了广泛关注。基于视频的步态识别系统在大型公共数据集上表现出色,但在实际应用中面对无约束环境时性能下降,这主要是由于不受控的户外环境、摄像机视角不一致、光照变化以及计算效率低下等因素造成的。目前尚无数据集同时解决这些挑战。 本文提出了一种名为OptiGait-LGBM的模型,该模型能够在上述限制条件下进行人员再识别,并使用骨骼模型的方法来减少外观上的不一致性。通过利用关键点的位置构建数据集,该方法能够降低内存消耗并处理非连续的数据。我们还引入了一个基准数据集RUET-GAIT,用于表示复杂户外环境中不受控的步态序列。研究过程包括提取骨骼关节的关键点、生成数值化数据集,并开发OptiGait-LGBM步态分类模型。 我们的目标是在与现有方法相比具有较低计算成本的情况下解决上述挑战。通过与随机森林和CatBoost等集成技术进行比较分析,证明了我们提出的方法在准确率、内存使用量以及训练时间方面均优于这些方法。这种方法提供了一种新颖的、低成本且内存高效的视频步态识别解决方案,适用于实际场景中的应用。
https://arxiv.org/abs/2505.08801
Gait recognition has emerged as a powerful tool for unobtrusive and long-range identity analysis, with growing relevance in surveillance and monitoring applications. Although recent advances in deep learning and large-scale datasets have enabled highly accurate recognition under closed-set conditions, real-world deployment demands open-set gait enrollment, which means determining whether a new gait sample corresponds to a known identity or represents a previously unseen individual. In this work, we introduce a transformer-based framework for open-set gait enrollment that is both dataset-agnostic and recognition-architecture-agnostic. Our method leverages a SetTransformer to make enrollment decisions based on the embedding of a probe sample and a context set drawn from the gallery, without requiring task-specific thresholds or retraining for new environments. By decoupling enrollment from the main recognition pipeline, our model is generalized across different datasets, gallery sizes, and identity distributions. We propose an evaluation protocol that uses existing datasets in different ratios of identities and walks per identity. We instantiate our method using skeleton-based gait representations and evaluate it on two benchmark datasets (CASIA-B and PsyMo), using embeddings from three state-of-the-art recognition models (GaitGraph, GaitFormer, and GaitPT). We show that our method is flexible, is able to accurately perform enrollment in different scenarios, and scales better with data compared to traditional approaches. We will make the code and dataset scenarios publicly available.
步态识别作为一种强有力的非侵入性和远距离身份分析工具,已在监控和监测应用中展现出日益重要的作用。尽管深度学习的最新进展以及大规模数据集的应用已经使得在闭合集合条件下实现高度准确的身份识别成为可能,但实际部署需要开放集合步态登记,这意味着必须确定一个新的步态样本是否对应于已知的身份或是之前未见过的个体。 在这项工作中,我们引入了一个基于变换器(Transformer)的框架,用于无数据集和身份认证架构依赖性的开放集合步态注册。我们的方法利用了SetTransformer,通过将探针样本和从图库中抽取的上下文集进行嵌入来做出注册决定,并且无需特定任务的阈值或重新训练以适应新环境。通过将注册过程与主要的身份识别管道解耦,我们的模型在不同的数据集、图库大小和身份分布之间实现了泛化。 我们提出了一种评估协议,利用现有数据集中不同比例的身份和每个身份的不同步态数量进行评价。我们将方法应用到基于骨架的步态表示,并在两个基准数据集(CASIA-B 和 PsyMo)上进行了测试,使用了三种最先进的识别模型(GaitGraph、GaitFormer 和 GaitPT)的嵌入。我们展示了该方法具有灵活性,在不同场景中能够准确执行注册操作,并且与传统方法相比,随着数据量的增长表现得更好。 我们将公开提供代码和数据集情景以供社区使用。
https://arxiv.org/abs/2505.02815
Gait recognition enables contact-free, long-range person identification that is robust to clothing variations and non-cooperative scenarios. While existing methods perform well in controlled indoor environments, they struggle with cross-vertical view scenarios, where surveillance angles vary significantly in elevation. Our experiments show up to 60\% accuracy degradation in low-to-high vertical view settings due to severe deformations and self-occlusions of key anatomical features. Current CNN and self-attention-based methods fail to effectively handle these challenges, due to their reliance on single-scale convolutions or simplistic attention mechanisms that lack effective multi-frequency feature integration. To tackle this challenge, we propose CVVNet (Cross-Vertical-View Network), a frequency aggregation architecture specifically designed for robust cross-vertical-view gait recognition. CVVNet employs a High-Low Frequency Extraction module (HLFE) that adopts parallel multi-scale convolution/max-pooling path and self-attention path as high- and low-frequency mixers for effective multi-frequency feature extraction from input silhouettes. We also introduce the Dynamic Gated Aggregation (DGA) mechanism to adaptively adjust the fusion ratio of high- and low-frequency features. The integration of our core Multi-Scale Attention Gated Aggregation (MSAGA) module, HLFE and DGA enables CVVNet to effectively handle distortions from view changes, significantly improving the recognition robustness across different vertical views. Experimental results show that our CVVNet achieves state-of-the-art performance, with $8.6\%$ improvement on DroneGait and $2\%$ on Gait3D compared with the best existing methods.
步态识别技术能够实现非接触式的长距离人员识别,对服装变化和非配合场景具有较强的鲁棒性。尽管现有的方法在受控的室内环境中表现良好,但在垂直视角变化显著的情况下(例如监控角度高度不同),它们的表现就会大打折扣。实验表明,在低到高垂直视角设置中,由于关键解剖特征出现严重变形和自我遮挡,步态识别准确率最多可下降60%。 目前基于卷积神经网络(CNN)和自注意力机制的方法因依赖单一尺度的卷积操作或简单的注意机制而难以有效应对上述挑战,这些机制缺乏有效的多频段特征整合。为解决这一难题,我们提出了一种专用于稳健垂直视角步态识别的频率聚合架构CVVNet(Cross-Vertical-View Network)。CVVNet采用了高低频提取模块(HLFE),该模块采用平行多尺度卷积/最大池化路径和自注意力路径作为高频和低频混合器,以有效从输入轮廓中提取多频段特征。我们还引入了动态门控聚合(DGA)机制来适应性地调整高、低频特征的融合比率。 通过整合核心的多尺度注意门控聚合(MSAGA)模块、HLFE 和 DGA,CVVNet 能够有效地处理视角变化带来的扭曲问题,并显著提高不同垂直视角下的识别鲁棒性。实验结果表明,我们的 CVVNet 达到了最先进的性能,在 DroneGait 数据集上比现有最佳方法提高了 8.6%,在 Gait3D 数据集上提高了2%。
https://arxiv.org/abs/2505.01837
In this paper, we propose H-MoRe, a novel pipeline for learning precise human-centric motion representation. Our approach dynamically preserves relevant human motion while filtering out background movement. Notably, unlike previous methods relying on fully supervised learning from synthetic data, H-MoRe learns directly from real-world scenarios in a self-supervised manner, incorporating both human pose and body shape information. Inspired by kinematics, H-MoRe represents absolute and relative movements of each body point in a matrix format that captures nuanced motion details, termed world-local flows. H-MoRe offers refined insights into human motion, which can be integrated seamlessly into various action-related applications. Experimental results demonstrate that H-MoRe brings substantial improvements across various downstream tasks, including gait recognition(CL@R1: +16.01%), action recognition(Acc@1: +8.92%), and video generation(FVD: -67.07%). Additionally, H-MoRe exhibits high inference efficiency (34 fps), making it suitable for most real-time scenarios. Models and code will be released upon publication.
在这篇论文中,我们提出了一种新颖的管道系统H-MoRe(Human-centric Motion Representation),用于学习精确的人体运动表示。我们的方法能够动态地保留相关的人类动作同时过滤掉背景移动。值得注意的是,与之前依赖于合成数据进行完全监督学习的方法不同,H-MoRe直接从现实场景中以自监督方式学习,结合了人体姿态和体型信息。 受动力学的启发,H-MoRe 采用矩阵格式来表示每个身体部位的绝对运动和相对运动,并捕捉细微的动作细节,称为世界局部流(world-local flows)。H-MoRe 提供了对人类动作的深入见解,可以无缝地集成到各种与行动相关的应用中。实验结果表明,H-MoRe 在多种下游任务中带来了显著改进,包括步态识别(CL@R1: +16.01%)、行为识别(Acc@1: +8.92%) 和视频生成(FVD: -67.07%)。此外,H-MoRe 还表现出高推断效率(34 fps),适合大多数实时场景应用。 论文发布后将公开模型和代码。
https://arxiv.org/abs/2504.10676
Gait recognition from video streams is a challenging problem in computer vision biometrics due to the subtle differences between gaits and numerous confounding factors. Recent advancements in self-supervised pretraining have led to the development of robust gait recognition models that are invariant to walking covariates. While neural scaling laws have transformed model development in other domains by linking performance to data, model size, and compute, their applicability to gait remains unexplored. In this work, we conduct the first empirical study scaling on skeleton-based self-supervised gait recognition to quantify the effect of data quantity, model size and compute on downstream gait recognition performance. We pretrain multiple variants of GaitPT - a transformer-based architecture - on a dataset of 2.7 million walking sequences collected in the wild. We evaluate zero-shot performance across four benchmark datasets to derive scaling laws for data, model size, and compute. Our findings demonstrate predictable power-law improvements in performance with increased scale and confirm that data and compute scaling significantly influence downstream accuracy. We further isolate architectural contributions by comparing GaitPT with GaitFormer under controlled compute budgets. These results provide practical insights into resource allocation and performance estimation for real-world gait recognition systems.
视频流中的步态识别是计算机视觉生物特征领域的一项挑战性问题,因为步态之间存在细微差异,并且有许多干扰因素。最近,在自我监督预训练领域的进展已导致开发出一些鲁棒的步态识别模型,这些模型对行走协变量具有不变性。尽管神经规模法则通过将性能与数据量、模型大小和计算能力联系起来,已在其他领域改变了模型的发展方式,但其在步态识别中的适用性尚未被探索。 在这项工作中,我们首次进行了基于骨架的自我监督步态识别的扩展研究,以量化数据量、模型大小和计算资源对下游步态识别性能的影响。我们在一个包含270万条野生采集步行序列的数据集上对GaitPT(一种基于变压器架构)的不同变体进行预训练,并在四个基准数据集上评估零样本性能,从而推导出关于数据量、模型规模和计算能力的扩展法则。 我们的发现表明,在增大规模的情况下,性能会按照可预测的幂律方式得到改进,并确认数据量和计算资源的扩展显著影响下游准确性。我们进一步通过控制计算预算,将GaitPT与GaitFormer进行比较来分离架构贡献。这些结果为实际步态识别系统的资源配置和性能估计提供了实用见解。
https://arxiv.org/abs/2504.07598
Gait recognition is emerging as a promising and innovative area within the field of computer vision, widely applied to remote person identification. Although existing gait recognition methods have achieved substantial success in controlled laboratory datasets, their performance often declines significantly when transitioning to wild this http URL argue that the performance gap can be primarily attributed to the spatio-temporal distribution inconsistencies present in wild datasets, where subjects appear at varying angles, positions, and distances across the frames. To achieve accurate gait recognition in the wild, we propose a skeleton-guided silhouette alignment strategy, which uses prior knowledge of the skeletons to perform affine transformations on the corresponding this http URL the best of our knowledge, this is the first study to explore the impact of data alignment on gait recognition. We conducted extensive experiments across multiple datasets and network architectures, and the results demonstrate the significant advantages of our proposed alignment this http URL, on the challenging Gait3D dataset, our method achieved an average performance improvement of 7.9% across all evaluated networks. Furthermore, our method achieves substantial improvements on cross-domain datasets, with accuracy improvements of up to 24.0%.
步态识别作为计算机视觉领域中的一个有前景且创新的方向,被广泛应用于远程身份验证。尽管现有的步态识别方法在控制良好的实验室数据集中取得了显著的成功,但当这些算法迁移到实际环境时,它们的性能通常会大幅下降。我们认为这种表现差距主要是由于野外数据集中的空间-时间分布不一致所导致的问题,在这些环境中,人物以不同的角度、位置和距离出现在不同帧中。 为了在真实场景中实现准确的步态识别,我们提出了一种基于骨骼指导轮廓对齐策略的方法,这种方法利用骨架先验知识来执行仿射变换以调整对应图像。据我们所知,这是第一个探索数据对准影响步态识别的研究项目。我们在多个数据集和网络架构上进行了广泛的实验,并且结果展示了我们的对准方法具有显著的优势。 在极具挑战性的Gait3D数据集中,我们提出的方法实现了所有评估的网络性能平均提高了7.9%。此外,在跨域数据集上,我们的方法也获得了高达24.0%准确率提升的重大改进。
https://arxiv.org/abs/2503.18830
Gait recognition has emerged as a robust biometric modality due to its non-intrusive nature and resilience to occlusion. Conventional gait recognition methods typically rely on silhouettes or skeletons. Despite their success in gait recognition for controlled laboratory environments, they usually fail in real-world scenarios due to their limited information entropy for gait representations. To achieve accurate gait recognition in the wild, we propose a novel gait representation, named Parsing Skeleton. This representation innovatively introduces the skeleton-guided human parsing method to capture fine-grained body dynamics, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the parsing skeleton representation, we propose a novel parsing skeleton-based gait recognition framework, named PSGait, which takes parsing skeletons and silhouettes as input. By fusing these two modalities, the resulting image sequences are fed into gait recognition models for enhanced individual differentiation. We conduct comprehensive benchmarks on various datasets to evaluate our model. PSGait outperforms existing state-of-the-art multimodal methods. Furthermore, as a plug-and-play method, PSGait leads to a maximum improvement of 10.9% in Rank-1 accuracy across various gait recognition models. These results demonstrate the effectiveness and versatility of parsing skeletons for gait recognition in the wild, establishing PSGait as a new state-of-the-art approach for multimodal gait recognition.
步态识别作为一种稳健的生物识别模式,由于其非侵入性和抗遮挡性而崭露头角。传统步态识别方法通常依赖于轮廓或骨架。尽管这些方法在受控实验室环境中取得了成功,但它们在现实世界场景中往往表现不佳,因为它们用来表示步态的信息熵非常有限。为了实现野外环境中的准确步态识别,我们提出了一种新颖的步态表示法,称为解析骨架(Parsing Skeleton)。这种表示通过引入骨骼引导的人体解析方法来捕捉细微的身体动态,从而显著提高了信息熵,能够更好地编码步行过程中人体各部分的具体形状和动态。 为了有效利用解析骨架表征的能力,我们提出了一个基于解析骨架的新型步态识别框架,命名为PSGait。该框架接受解析骨架和轮廓作为输入,并通过融合这两种模态的信息来提高图像序列在步态识别模型中的个体区分度。我们在多种数据集上进行了全面基准测试以评估我们的模型性能。结果表明,PSGait超越了现有的最先进的多模式方法。 此外,作为一种即插即用的方法,PSGait在各种步态识别模型中实现了最高达10.9%的Rank-1准确率提升。这些结果证明了解析骨架对于野外环境中的步态识别的有效性和灵活性,并将PSGait确立为新的多模态步态识别前沿方法。
https://arxiv.org/abs/2503.12047
The adoption of Millimeter-Wave (mmWave) radar devices for human sensing, particularly gait recognition, has recently gathered significant attention due to their efficiency, resilience to environmental conditions, and privacy-preserving nature. In this work, we tackle the challenging problem of Open-set Gait Recognition (OSGR) from sparse mmWave radar point clouds. Unlike most existing research, which assumes a closed-set scenario, our work considers the more realistic open-set case, where unknown subjects might be present at inference time, and should be correctly recognized by the system. Point clouds are well-suited for edge computing applications with resource constraints, but are more significantly affected by noise and random fluctuations than other representations, like the more common micro-Doppler signature. This is the first work addressing open-set gait recognition with sparse point cloud data. To do so, we propose a novel neural network architecture that combines supervised classification with unsupervised reconstruction of the point clouds, creating a robust, rich, and highly regularized latent space of gait features. To detect unknown subjects at inference time, we introduce a probabilistic novelty detection algorithm that leverages the structured latent space and offers a tunable trade-off between inference speed and prediction accuracy. Along with this paper, we release mmGait10, an original human gait dataset featuring over five hours of measurements from ten subjects, under varied walking modalities. Extensive experimental results show that our solution attains F1-Score improvements by 24% over state-of-the-art methods, on average, and across multiple openness levels.
毫米波(mmWave)雷达设备在人体感应中的应用,特别是步态识别领域,因其实效性、环境条件的适应性和隐私保护特性而备受关注。本研究旨在解决来自稀疏毫米波雷达点云数据的开放集步态识别(OSGR)这一挑战性问题。与大多数现有的假设封闭集合场景的研究不同,我们的工作考虑了更为现实的开放集案例,在这种情况下,未知主体可能在推理时出现,并且系统应能够正确地识别它们。点云非常适合资源受限边缘计算应用的需求,但与其他表示方式相比(如常见的微多普勒签名),其更容易受到噪声和随机波动的影响。这是首个针对稀疏点云数据的开放集步态识别的研究工作。 为了解决这一问题,我们提出了一种新颖的神经网络架构,该架构结合了监督分类与无监督重建点云技术,从而创建了一个具有强大鲁棒性、丰富性和高度正则化的潜在步态特征空间。为了在推理时检测未知主体,我们引入了一种概率异常检测算法,利用结构化潜在空间,并提供了可调节的推断速度和预测准确性之间的权衡。 此外,本研究还发布了mmGait10数据集——一个包含来自十名不同参与者的超过五小时步态测量的新原始人体步态数据库,涵盖了多种步行模式。广泛的实验结果表明,我们的解决方案在开放性程度不同的情况下,平均比现有方法的F1-Score提高了24%。
https://arxiv.org/abs/2503.07435
Gait recognition is a computer vision task that identifies individuals based on their walking patterns. Gait recognition performance is commonly evaluated by ranking a gallery of candidates and measuring the accuracy at the top Rank-$K$. Existing models are typically single-staged, i.e. searching for the probe's nearest neighbors in a gallery using a single global feature representation. Although these models typically excel at retrieving the correct identity within the top-$K$ predictions, they struggle when hard negatives appear in the top short-list, leading to relatively low performance at the highest ranks (e.g., Rank-1). In this paper, we introduce CarGait, a Cross-Attention Re-ranking method for gait recognition, that involves re-ordering the top-$K$ list leveraging the fine-grained correlations between pairs of gait sequences through cross-attention between gait strips. This re-ranking scheme can be adapted to existing single-stage models to enhance their final results. We demonstrate the capabilities of CarGait by extensive experiments on three common gait datasets, Gait3D, GREW, and OU-MVLP, and seven different gait models, showing consistent improvements in Rank-1,5 accuracy, superior results over existing re-ranking methods, and strong baselines.
步态识别是一种基于个人行走模式来识别个体的计算机视觉任务。步态识别性能通常通过在候选图库中对目标进行排名并测量前$K$位的准确率来进行评估。现有的模型通常是单阶段的,即利用单一全局特征表示在一个图库中为探针寻找最近邻。尽管这些模型通常擅长在前$K$个预测中检索正确的身份,但在顶级短名单中出现硬负样本时会遇到困难,导致最高排名(例如,Rank-1)下的表现相对较低。 在这篇论文中,我们介绍了CarGait,一种用于步态识别的跨注意力重排序方法。该方法通过使用步态条之间的跨注意力来利用步态序列对之间细微的相关性,并重新排列前$K$位列表。这种重排序方案可以适应现有的单阶段模型以增强其最终结果。 我们在三个常见的步态数据集(Gait3D、GREW和OU-MVLP)上进行了广泛的实验,以及七种不同的步态模型,展示了在Rank-1和Rank-5准确率上的持续改进,超过现有重排序方法的优越结果,并且优于强大的基准。
https://arxiv.org/abs/2503.03501
Gait refers to the patterns of limb movement generated during walking, which are unique to each individual due to both physical and behavioural traits. Walking patterns have been widely studied in biometrics, biomechanics, sports, and rehabilitation. While traditional methods rely on video and motion capture, advances in underfoot pressure sensing technology now offer deeper insights into gait. However, underfoot pressures during walking remain underexplored due to the lack of large, publicly accessible datasets. To address this, the UNB StepUP database was created, featuring gait pressure data collected with high-resolution pressure sensing tiles (4 sensors/cm\textsuperscript{2}, 1.2m by 3.6m). Its first release, UNB StepUP-P150, includes over 200,000 footsteps from 150 individuals across various walking speeds (preferred, slow-to-stop, fast, and slow) and footwear types (barefoot, standard shoes, and two personal shoes). As the largest and most comprehensive dataset of its kind, it supports biometric gait recognition while presenting new research opportunities in biomechanics and deep learning. The UNB StepUP-P150 dataset sets a new benchmark for pressure-based gait analysis and recognition.
步态是指在行走过程中肢体运动的模式,这种模式因个人的身体和行为特征而独一无二。步行模式已在生物识别学、生物力学、体育科学及康复医学等领域广泛研究。传统的方法主要依赖于视频分析和动作捕捉技术,但随着足下压力感应技术的进步,我们现在可以获得更加深入的步态洞察。然而,由于缺乏大规模且公开可访问的数据集,行走时的足下压力仍然没有得到充分的研究。 为了解决这一问题,新不伦瑞克大学(UNB)创建了StepUP数据库,该数据库收录了使用高分辨率的压力感应地砖采集到的步态压力数据(每平方厘米有4个传感器,尺寸为1.2米×3.6米)。StepUP的第一版,即UNB StepUP-P150,包括了来自150名个体超过20万次的脚步数据。这些数据涵盖了不同的步行速度(习惯性、缓慢至停止、快速和慢速)以及不同类型的鞋类(赤脚、标准鞋子和个人的两双鞋)。作为同类数据集中规模最大且最全面的数据集,UNB StepUP-P150支持生物识别步态识别,并为生物力学研究及深度学习提供了新的探索机会。该数据库确立了基于压力的步态分析和识别的新基准。
https://arxiv.org/abs/2502.17244
Gait recognition is an emerging identification technology that distinguishes individuals at long distances by analyzing individual walking patterns. Traditional techniques rely heavily on large-scale labeled datasets, which incurs high costs and significant labeling challenges. Recently, researchers have explored unsupervised gait recognition with clustering-based unsupervised domain adaptation methods and achieved notable success. However, these methods directly use pseudo-label generated by clustering and neglect pseudolabel noise caused by domain differences, which affects the effect of the model training process. To mitigate these issues, we proposed a novel model called GaitDCCR, which aims to reduce the influence of noisy pseudo labels on clustering and model training. Our approach can be divided into two main stages: clustering and training stage. In the clustering stage, we propose Dynamic Cluster Parameters (DCP) and Dynamic Weight Centroids (DWC) to improve the efficiency of clustering and obtain reliable cluster centroids. In the training stage, we employ the classical teacher-student structure and propose Confidence-based Pseudo-label Refinement (CPR) and Contrastive Teacher Module (CTM) to encourage noisy samples to converge towards clusters containing their true identities. Extensive experiments on public gait datasets have demonstrated that our simple and effective method significantly enhances the performance of unsupervised gait recognition, laying the foundation for its application in the this http URL code is available at this https URL
步态识别是一种新兴的身份验证技术,通过分析个体的行走模式在远距离下区分个人。传统的技术方法依赖于大规模标注数据集,这带来了高昂的成本和显著的标注挑战。近期,研究人员探索了基于聚类的无监督领域适应方法的无监督步态识别,并取得了显著的成功。然而,这些方法直接使用由聚类生成的伪标签,忽视了由于领域差异导致的伪标签噪声问题,从而影响模型训练过程的效果。 为了缓解这些问题,我们提出了一种名为GaitDCCR的新模型,旨在减少噪声音伪标签对聚类和模型训练的影响。我们的方法可以分为两个主要阶段:聚类阶段和训练阶段。在聚类阶段中,我们提出了动态簇参数(DCP)和动态权重中心(DWC),以提高聚类效率并获得可靠的簇质心。在训练阶段中,我们采用了经典的教师-学生结构,并提出基于置信度的伪标签精炼(CPR)以及对比式教师模块(CTM),鼓励噪声音样本收敛至包含其真实身份的簇内。 我们在公开的步态数据集上进行了广泛的实验,结果表明我们的简单而有效的方法显著提升了无监督步态识别的表现,为其实用化奠定了基础。相关代码可在提供的链接中获取。
https://arxiv.org/abs/2501.16608
Gait recognition is an important biometric technique over large distances. State-of-the-art gait recognition systems perform very well in controlled environments at close range. Recently, there has been an increased interest in gait recognition in the wild prompted by the collection of outdoor, more challenging datasets containing variations in terms of illumination, pitch angles, and distances. An important problem in these environments is that of occlusion, where the subject is partially blocked from camera view. While important, this problem has received little attention. Thus, we propose MimicGait, a model-agnostic approach for gait recognition in the presence of occlusions. We train the network using a multi-instance correlational distillation loss to capture both inter-sequence and intra-sequence correlations in the occluded gait patterns of a subject, utilizing an auxiliary Visibility Estimation Network to guide the training of the proposed mimic network. We demonstrate the effectiveness of our approach on challenging real-world datasets like GREW, Gait3D and BRIAR. We release the code in this https URL.
步态识别是一种重要的生物识别技术,尤其适用于远距离的应用场景。目前最先进的步态识别系统在受控环境下的近距离表现非常出色。最近,由于收集到了包含光照变化、仰角变化和距离差异等挑战的户外数据集,人们对野外条件下的步态识别产生了越来越多的兴趣。这些环境中一个重要且较少受到关注的问题是遮挡问题,即目标对象部分被摄像机视野阻挡的情况。 为了解决这一问题,我们提出了MimicGait方法,这是一种模型无关的方法,用于在存在遮挡的情况下进行步态识别。通过使用多实例相关蒸馏损失函数来训练网络,该方法能够捕捉到一个主体的被遮挡步态模式之间的序列内外相关性,并利用辅助的可见性估计网络来指导我们提出的模仿网络的学习过程。 我们在具有挑战性的现实世界数据集(如GREW、Gait3D和BRIAR)上展示了我们的方法的有效性。我们的代码可在上述链接中获取。
https://arxiv.org/abs/2501.15666
Gait recognition is a significant biometric technique for person identification, particularly in scenarios where other physiological biometrics are impractical or ineffective. In this paper, we address the challenges associated with gait recognition and present a novel approach to improve its accuracy and reliability. The proposed method leverages advanced techniques, including sequential gait landmarks obtained through the Mediapipe pose estimation model, Procrustes analysis for alignment, and a Siamese biGRU-dualStack Neural Network architecture for capturing temporal dependencies. Extensive experiments were conducted on large-scale cross-view datasets to demonstrate the effectiveness of the approach, achieving high recognition accuracy compared to other models. The model demonstrated accuracies of 95.7%, 94.44%, 87.71%, and 86.6% on CASIA-B, SZU RGB-D, OU-MVLP, and Gait3D datasets respectively. The results highlight the potential applications of the proposed method in various practical domains, indicating its significant contribution to the field of gait recognition.
https://arxiv.org/abs/2412.03498
Existing studies for gait recognition primarily utilized sequences of either binary silhouette or human parsing to encode the shapes and dynamics of persons during walking. Silhouettes exhibit accurate segmentation quality and robustness to environmental variations, but their low information entropy may result in sub-optimal performance. In contrast, human parsing provides fine-grained part segmentation with higher information entropy, but the segmentation quality may deteriorate due to the complex environments. To discover the advantages of silhouette and parsing and overcome their limitations, this paper proposes a novel cross-granularity alignment gait recognition method, named XGait, to unleash the power of gait representations of different granularity. To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces, respectively. Moreover, to explore the complementary knowledge across the features of two representations, we design the Global Cross-granularity Module (GCM) and the Part Cross-granularity Module (PCM) after the two encoders. In particular, the GCM aims to enhance the quality of parsing features by leveraging global features from silhouettes, while the PCM aligns the dynamics of human parts between silhouette and parsing features using the high information entropy in parsing sequences. In addition, to effectively guide the alignment of two representations with different granularity at the part level, an elaborate-designed learnable division mechanism is proposed for the parsing features. Comprehensive experiments on two large-scale gait datasets not only show the superior performance of XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG but also reflect the robustness of the learned features even under challenging conditions like occlusions and cloth changes.
现有的步态识别研究主要利用二值轮廓序列或人体分割序列来编码行走过程中的人物形状和动态。轮廓表现出准确的分割质量和对环境变化的强大鲁棒性,但其低信息熵可能导致性能不佳。相比之下,人体解析提供了更高信息熵的细粒度部分分割,但由于复杂环境的影响,分割质量可能会下降。为了发现轮廓和解析的优势并克服它们的局限性,本文提出了一种新颖的跨粒度步态识别方法,名为XGait,以释放不同粒度下的步态表示力。为实现这一目标,XGait首先包含两个骨干编码器分支,分别将轮廓序列和解析序列映射到两个潜在空间中。此外,为了探索两种表示特征之间的互补知识,在两个编码器之后设计了全局跨粒度模块(GCM)和部分跨粒度模块(PCM)。特别是,GCM旨在通过利用来自轮廓的全局特征来增强解析特征的质量,而PCM则使用解析序列中的高信息熵对轮廓与解析特征之间的人体部位动态进行对齐。此外,为了在部分级别上有效地指导两种不同粒度表示的对齐,提出了一个精心设计的学习分割机制用于解析特征。在两个大规模步态数据集上的综合实验不仅展示了XGait以80.5%的Rank-1准确率在Gait3D和88.3%CCPG中的优越性能,而且还反映了学习到的特征即使在遮挡和衣物变化等具有挑战性的条件下也具备鲁棒性。
https://arxiv.org/abs/2411.10742
Recently, 3D LiDAR has emerged as a promising technique in the field of gait-based person identification, serving as an alternative to traditional RGB cameras, due to its robustness under varying lighting conditions and its ability to capture 3D geometric information. However, long capture distances or the use of low-cost LiDAR sensors often result in sparse human point clouds, leading to a decline in identification performance. To address these challenges, we propose a sparse-to-dense upsampling model for pedestrian point clouds in LiDAR-based gait recognition, named LidarGSU, which is designed to improve the generalization capability of existing identification models. Our method utilizes diffusion probabilistic models (DPMs), which have shown high fidelity in generative tasks such as image completion. In this work, we leverage DPMs on sparse sequential pedestrian point clouds as conditional masks in a video-to-video translation approach, applied in an inpainting manner. We conducted extensive experiments on the SUSTeck1K dataset to evaluate the generative quality and recognition performance of the proposed method. Furthermore, we demonstrate the applicability of our upsampling model using a real-world dataset, captured with a low-resolution sensor across varying measurement distances.
https://arxiv.org/abs/2410.08680
Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interference in recognition while significantly advancing privacy protection. For complex 3D representations, shallow networks fail to achieve accurate recognition, making vision Transformers the foremost prevalent method. However, the prevalence of dumb patches has limited the widespread use of Transformer architecture in gait recognition. This paper proposes a method named HorGait, which utilizes a hybrid model with a Transformer architecture for gait recognition on the planar projection of 3D point clouds from LiDAR. Specifically, it employs a hybrid model structure called LHM Block to achieve input adaptation, long-range, and high-order spatial interaction of the Transformer architecture. Additionally, it uses large convolutional kernel CNNs to segment the input representation, replacing attention windows to reduce dumb patches. We conducted extensive experiments, and the results show that HorGait achieves state-of-the-art performance among Transformer architecture methods on the SUSTech1K dataset, verifying that the hybrid model can complete the full Transformer process and perform better in point cloud planar projection. The outstanding performance of HorGait offers new insights for the future application of the Transformer architecture in gait recognition.
https://arxiv.org/abs/2410.08454