Gait recognition has emerged as a robust biometric modality due to its non-intrusive nature and resilience to occlusion. Conventional gait recognition methods typically rely on silhouettes or skeletons. Despite their success in gait recognition for controlled laboratory environments, they usually fail in real-world scenarios due to their limited information entropy for gait representations. To achieve accurate gait recognition in the wild, we propose a novel gait representation, named Parsing Skeleton. This representation innovatively introduces the skeleton-guided human parsing method to capture fine-grained body dynamics, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the parsing skeleton representation, we propose a novel parsing skeleton-based gait recognition framework, named PSGait, which takes parsing skeletons and silhouettes as input. By fusing these two modalities, the resulting image sequences are fed into gait recognition models for enhanced individual differentiation. We conduct comprehensive benchmarks on various datasets to evaluate our model. PSGait outperforms existing state-of-the-art multimodal methods. Furthermore, as a plug-and-play method, PSGait leads to a maximum improvement of 10.9% in Rank-1 accuracy across various gait recognition models. These results demonstrate the effectiveness and versatility of parsing skeletons for gait recognition in the wild, establishing PSGait as a new state-of-the-art approach for multimodal gait recognition.
步态识别作为一种稳健的生物识别模式,由于其非侵入性和抗遮挡性而崭露头角。传统步态识别方法通常依赖于轮廓或骨架。尽管这些方法在受控实验室环境中取得了成功,但它们在现实世界场景中往往表现不佳,因为它们用来表示步态的信息熵非常有限。为了实现野外环境中的准确步态识别,我们提出了一种新颖的步态表示法,称为解析骨架(Parsing Skeleton)。这种表示通过引入骨骼引导的人体解析方法来捕捉细微的身体动态,从而显著提高了信息熵,能够更好地编码步行过程中人体各部分的具体形状和动态。 为了有效利用解析骨架表征的能力,我们提出了一个基于解析骨架的新型步态识别框架,命名为PSGait。该框架接受解析骨架和轮廓作为输入,并通过融合这两种模态的信息来提高图像序列在步态识别模型中的个体区分度。我们在多种数据集上进行了全面基准测试以评估我们的模型性能。结果表明,PSGait超越了现有的最先进的多模式方法。 此外,作为一种即插即用的方法,PSGait在各种步态识别模型中实现了最高达10.9%的Rank-1准确率提升。这些结果证明了解析骨架对于野外环境中的步态识别的有效性和灵活性,并将PSGait确立为新的多模态步态识别前沿方法。
https://arxiv.org/abs/2503.12047
The adoption of Millimeter-Wave (mmWave) radar devices for human sensing, particularly gait recognition, has recently gathered significant attention due to their efficiency, resilience to environmental conditions, and privacy-preserving nature. In this work, we tackle the challenging problem of Open-set Gait Recognition (OSGR) from sparse mmWave radar point clouds. Unlike most existing research, which assumes a closed-set scenario, our work considers the more realistic open-set case, where unknown subjects might be present at inference time, and should be correctly recognized by the system. Point clouds are well-suited for edge computing applications with resource constraints, but are more significantly affected by noise and random fluctuations than other representations, like the more common micro-Doppler signature. This is the first work addressing open-set gait recognition with sparse point cloud data. To do so, we propose a novel neural network architecture that combines supervised classification with unsupervised reconstruction of the point clouds, creating a robust, rich, and highly regularized latent space of gait features. To detect unknown subjects at inference time, we introduce a probabilistic novelty detection algorithm that leverages the structured latent space and offers a tunable trade-off between inference speed and prediction accuracy. Along with this paper, we release mmGait10, an original human gait dataset featuring over five hours of measurements from ten subjects, under varied walking modalities. Extensive experimental results show that our solution attains F1-Score improvements by 24% over state-of-the-art methods, on average, and across multiple openness levels.
毫米波(mmWave)雷达设备在人体感应中的应用,特别是步态识别领域,因其实效性、环境条件的适应性和隐私保护特性而备受关注。本研究旨在解决来自稀疏毫米波雷达点云数据的开放集步态识别(OSGR)这一挑战性问题。与大多数现有的假设封闭集合场景的研究不同,我们的工作考虑了更为现实的开放集案例,在这种情况下,未知主体可能在推理时出现,并且系统应能够正确地识别它们。点云非常适合资源受限边缘计算应用的需求,但与其他表示方式相比(如常见的微多普勒签名),其更容易受到噪声和随机波动的影响。这是首个针对稀疏点云数据的开放集步态识别的研究工作。 为了解决这一问题,我们提出了一种新颖的神经网络架构,该架构结合了监督分类与无监督重建点云技术,从而创建了一个具有强大鲁棒性、丰富性和高度正则化的潜在步态特征空间。为了在推理时检测未知主体,我们引入了一种概率异常检测算法,利用结构化潜在空间,并提供了可调节的推断速度和预测准确性之间的权衡。 此外,本研究还发布了mmGait10数据集——一个包含来自十名不同参与者的超过五小时步态测量的新原始人体步态数据库,涵盖了多种步行模式。广泛的实验结果表明,我们的解决方案在开放性程度不同的情况下,平均比现有方法的F1-Score提高了24%。
https://arxiv.org/abs/2503.07435
Gait recognition is a computer vision task that identifies individuals based on their walking patterns. Gait recognition performance is commonly evaluated by ranking a gallery of candidates and measuring the accuracy at the top Rank-$K$. Existing models are typically single-staged, i.e. searching for the probe's nearest neighbors in a gallery using a single global feature representation. Although these models typically excel at retrieving the correct identity within the top-$K$ predictions, they struggle when hard negatives appear in the top short-list, leading to relatively low performance at the highest ranks (e.g., Rank-1). In this paper, we introduce CarGait, a Cross-Attention Re-ranking method for gait recognition, that involves re-ordering the top-$K$ list leveraging the fine-grained correlations between pairs of gait sequences through cross-attention between gait strips. This re-ranking scheme can be adapted to existing single-stage models to enhance their final results. We demonstrate the capabilities of CarGait by extensive experiments on three common gait datasets, Gait3D, GREW, and OU-MVLP, and seven different gait models, showing consistent improvements in Rank-1,5 accuracy, superior results over existing re-ranking methods, and strong baselines.
步态识别是一种基于个人行走模式来识别个体的计算机视觉任务。步态识别性能通常通过在候选图库中对目标进行排名并测量前$K$位的准确率来进行评估。现有的模型通常是单阶段的,即利用单一全局特征表示在一个图库中为探针寻找最近邻。尽管这些模型通常擅长在前$K$个预测中检索正确的身份,但在顶级短名单中出现硬负样本时会遇到困难,导致最高排名(例如,Rank-1)下的表现相对较低。 在这篇论文中,我们介绍了CarGait,一种用于步态识别的跨注意力重排序方法。该方法通过使用步态条之间的跨注意力来利用步态序列对之间细微的相关性,并重新排列前$K$位列表。这种重排序方案可以适应现有的单阶段模型以增强其最终结果。 我们在三个常见的步态数据集(Gait3D、GREW和OU-MVLP)上进行了广泛的实验,以及七种不同的步态模型,展示了在Rank-1和Rank-5准确率上的持续改进,超过现有重排序方法的优越结果,并且优于强大的基准。
https://arxiv.org/abs/2503.03501
Gait refers to the patterns of limb movement generated during walking, which are unique to each individual due to both physical and behavioural traits. Walking patterns have been widely studied in biometrics, biomechanics, sports, and rehabilitation. While traditional methods rely on video and motion capture, advances in underfoot pressure sensing technology now offer deeper insights into gait. However, underfoot pressures during walking remain underexplored due to the lack of large, publicly accessible datasets. To address this, the UNB StepUP database was created, featuring gait pressure data collected with high-resolution pressure sensing tiles (4 sensors/cm\textsuperscript{2}, 1.2m by 3.6m). Its first release, UNB StepUP-P150, includes over 200,000 footsteps from 150 individuals across various walking speeds (preferred, slow-to-stop, fast, and slow) and footwear types (barefoot, standard shoes, and two personal shoes). As the largest and most comprehensive dataset of its kind, it supports biometric gait recognition while presenting new research opportunities in biomechanics and deep learning. The UNB StepUP-P150 dataset sets a new benchmark for pressure-based gait analysis and recognition.
步态是指在行走过程中肢体运动的模式,这种模式因个人的身体和行为特征而独一无二。步行模式已在生物识别学、生物力学、体育科学及康复医学等领域广泛研究。传统的方法主要依赖于视频分析和动作捕捉技术,但随着足下压力感应技术的进步,我们现在可以获得更加深入的步态洞察。然而,由于缺乏大规模且公开可访问的数据集,行走时的足下压力仍然没有得到充分的研究。 为了解决这一问题,新不伦瑞克大学(UNB)创建了StepUP数据库,该数据库收录了使用高分辨率的压力感应地砖采集到的步态压力数据(每平方厘米有4个传感器,尺寸为1.2米×3.6米)。StepUP的第一版,即UNB StepUP-P150,包括了来自150名个体超过20万次的脚步数据。这些数据涵盖了不同的步行速度(习惯性、缓慢至停止、快速和慢速)以及不同类型的鞋类(赤脚、标准鞋子和个人的两双鞋)。作为同类数据集中规模最大且最全面的数据集,UNB StepUP-P150支持生物识别步态识别,并为生物力学研究及深度学习提供了新的探索机会。该数据库确立了基于压力的步态分析和识别的新基准。
https://arxiv.org/abs/2502.17244
Gait recognition is an emerging identification technology that distinguishes individuals at long distances by analyzing individual walking patterns. Traditional techniques rely heavily on large-scale labeled datasets, which incurs high costs and significant labeling challenges. Recently, researchers have explored unsupervised gait recognition with clustering-based unsupervised domain adaptation methods and achieved notable success. However, these methods directly use pseudo-label generated by clustering and neglect pseudolabel noise caused by domain differences, which affects the effect of the model training process. To mitigate these issues, we proposed a novel model called GaitDCCR, which aims to reduce the influence of noisy pseudo labels on clustering and model training. Our approach can be divided into two main stages: clustering and training stage. In the clustering stage, we propose Dynamic Cluster Parameters (DCP) and Dynamic Weight Centroids (DWC) to improve the efficiency of clustering and obtain reliable cluster centroids. In the training stage, we employ the classical teacher-student structure and propose Confidence-based Pseudo-label Refinement (CPR) and Contrastive Teacher Module (CTM) to encourage noisy samples to converge towards clusters containing their true identities. Extensive experiments on public gait datasets have demonstrated that our simple and effective method significantly enhances the performance of unsupervised gait recognition, laying the foundation for its application in the this http URL code is available at this https URL
步态识别是一种新兴的身份验证技术,通过分析个体的行走模式在远距离下区分个人。传统的技术方法依赖于大规模标注数据集,这带来了高昂的成本和显著的标注挑战。近期,研究人员探索了基于聚类的无监督领域适应方法的无监督步态识别,并取得了显著的成功。然而,这些方法直接使用由聚类生成的伪标签,忽视了由于领域差异导致的伪标签噪声问题,从而影响模型训练过程的效果。 为了缓解这些问题,我们提出了一种名为GaitDCCR的新模型,旨在减少噪声音伪标签对聚类和模型训练的影响。我们的方法可以分为两个主要阶段:聚类阶段和训练阶段。在聚类阶段中,我们提出了动态簇参数(DCP)和动态权重中心(DWC),以提高聚类效率并获得可靠的簇质心。在训练阶段中,我们采用了经典的教师-学生结构,并提出基于置信度的伪标签精炼(CPR)以及对比式教师模块(CTM),鼓励噪声音样本收敛至包含其真实身份的簇内。 我们在公开的步态数据集上进行了广泛的实验,结果表明我们的简单而有效的方法显著提升了无监督步态识别的表现,为其实用化奠定了基础。相关代码可在提供的链接中获取。
https://arxiv.org/abs/2501.16608
Gait recognition is an important biometric technique over large distances. State-of-the-art gait recognition systems perform very well in controlled environments at close range. Recently, there has been an increased interest in gait recognition in the wild prompted by the collection of outdoor, more challenging datasets containing variations in terms of illumination, pitch angles, and distances. An important problem in these environments is that of occlusion, where the subject is partially blocked from camera view. While important, this problem has received little attention. Thus, we propose MimicGait, a model-agnostic approach for gait recognition in the presence of occlusions. We train the network using a multi-instance correlational distillation loss to capture both inter-sequence and intra-sequence correlations in the occluded gait patterns of a subject, utilizing an auxiliary Visibility Estimation Network to guide the training of the proposed mimic network. We demonstrate the effectiveness of our approach on challenging real-world datasets like GREW, Gait3D and BRIAR. We release the code in this https URL.
步态识别是一种重要的生物识别技术,尤其适用于远距离的应用场景。目前最先进的步态识别系统在受控环境下的近距离表现非常出色。最近,由于收集到了包含光照变化、仰角变化和距离差异等挑战的户外数据集,人们对野外条件下的步态识别产生了越来越多的兴趣。这些环境中一个重要且较少受到关注的问题是遮挡问题,即目标对象部分被摄像机视野阻挡的情况。 为了解决这一问题,我们提出了MimicGait方法,这是一种模型无关的方法,用于在存在遮挡的情况下进行步态识别。通过使用多实例相关蒸馏损失函数来训练网络,该方法能够捕捉到一个主体的被遮挡步态模式之间的序列内外相关性,并利用辅助的可见性估计网络来指导我们提出的模仿网络的学习过程。 我们在具有挑战性的现实世界数据集(如GREW、Gait3D和BRIAR)上展示了我们的方法的有效性。我们的代码可在上述链接中获取。
https://arxiv.org/abs/2501.15666
Gait recognition is a significant biometric technique for person identification, particularly in scenarios where other physiological biometrics are impractical or ineffective. In this paper, we address the challenges associated with gait recognition and present a novel approach to improve its accuracy and reliability. The proposed method leverages advanced techniques, including sequential gait landmarks obtained through the Mediapipe pose estimation model, Procrustes analysis for alignment, and a Siamese biGRU-dualStack Neural Network architecture for capturing temporal dependencies. Extensive experiments were conducted on large-scale cross-view datasets to demonstrate the effectiveness of the approach, achieving high recognition accuracy compared to other models. The model demonstrated accuracies of 95.7%, 94.44%, 87.71%, and 86.6% on CASIA-B, SZU RGB-D, OU-MVLP, and Gait3D datasets respectively. The results highlight the potential applications of the proposed method in various practical domains, indicating its significant contribution to the field of gait recognition.
https://arxiv.org/abs/2412.03498
Existing studies for gait recognition primarily utilized sequences of either binary silhouette or human parsing to encode the shapes and dynamics of persons during walking. Silhouettes exhibit accurate segmentation quality and robustness to environmental variations, but their low information entropy may result in sub-optimal performance. In contrast, human parsing provides fine-grained part segmentation with higher information entropy, but the segmentation quality may deteriorate due to the complex environments. To discover the advantages of silhouette and parsing and overcome their limitations, this paper proposes a novel cross-granularity alignment gait recognition method, named XGait, to unleash the power of gait representations of different granularity. To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces, respectively. Moreover, to explore the complementary knowledge across the features of two representations, we design the Global Cross-granularity Module (GCM) and the Part Cross-granularity Module (PCM) after the two encoders. In particular, the GCM aims to enhance the quality of parsing features by leveraging global features from silhouettes, while the PCM aligns the dynamics of human parts between silhouette and parsing features using the high information entropy in parsing sequences. In addition, to effectively guide the alignment of two representations with different granularity at the part level, an elaborate-designed learnable division mechanism is proposed for the parsing features. Comprehensive experiments on two large-scale gait datasets not only show the superior performance of XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG but also reflect the robustness of the learned features even under challenging conditions like occlusions and cloth changes.
现有的步态识别研究主要利用二值轮廓序列或人体分割序列来编码行走过程中的人物形状和动态。轮廓表现出准确的分割质量和对环境变化的强大鲁棒性,但其低信息熵可能导致性能不佳。相比之下,人体解析提供了更高信息熵的细粒度部分分割,但由于复杂环境的影响,分割质量可能会下降。为了发现轮廓和解析的优势并克服它们的局限性,本文提出了一种新颖的跨粒度步态识别方法,名为XGait,以释放不同粒度下的步态表示力。为实现这一目标,XGait首先包含两个骨干编码器分支,分别将轮廓序列和解析序列映射到两个潜在空间中。此外,为了探索两种表示特征之间的互补知识,在两个编码器之后设计了全局跨粒度模块(GCM)和部分跨粒度模块(PCM)。特别是,GCM旨在通过利用来自轮廓的全局特征来增强解析特征的质量,而PCM则使用解析序列中的高信息熵对轮廓与解析特征之间的人体部位动态进行对齐。此外,为了在部分级别上有效地指导两种不同粒度表示的对齐,提出了一个精心设计的学习分割机制用于解析特征。在两个大规模步态数据集上的综合实验不仅展示了XGait以80.5%的Rank-1准确率在Gait3D和88.3%CCPG中的优越性能,而且还反映了学习到的特征即使在遮挡和衣物变化等具有挑战性的条件下也具备鲁棒性。
https://arxiv.org/abs/2411.10742
Recently, 3D LiDAR has emerged as a promising technique in the field of gait-based person identification, serving as an alternative to traditional RGB cameras, due to its robustness under varying lighting conditions and its ability to capture 3D geometric information. However, long capture distances or the use of low-cost LiDAR sensors often result in sparse human point clouds, leading to a decline in identification performance. To address these challenges, we propose a sparse-to-dense upsampling model for pedestrian point clouds in LiDAR-based gait recognition, named LidarGSU, which is designed to improve the generalization capability of existing identification models. Our method utilizes diffusion probabilistic models (DPMs), which have shown high fidelity in generative tasks such as image completion. In this work, we leverage DPMs on sparse sequential pedestrian point clouds as conditional masks in a video-to-video translation approach, applied in an inpainting manner. We conducted extensive experiments on the SUSTeck1K dataset to evaluate the generative quality and recognition performance of the proposed method. Furthermore, we demonstrate the applicability of our upsampling model using a real-world dataset, captured with a low-resolution sensor across varying measurement distances.
https://arxiv.org/abs/2410.08680
Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interference in recognition while significantly advancing privacy protection. For complex 3D representations, shallow networks fail to achieve accurate recognition, making vision Transformers the foremost prevalent method. However, the prevalence of dumb patches has limited the widespread use of Transformer architecture in gait recognition. This paper proposes a method named HorGait, which utilizes a hybrid model with a Transformer architecture for gait recognition on the planar projection of 3D point clouds from LiDAR. Specifically, it employs a hybrid model structure called LHM Block to achieve input adaptation, long-range, and high-order spatial interaction of the Transformer architecture. Additionally, it uses large convolutional kernel CNNs to segment the input representation, replacing attention windows to reduce dumb patches. We conducted extensive experiments, and the results show that HorGait achieves state-of-the-art performance among Transformer architecture methods on the SUSTech1K dataset, verifying that the hybrid model can complete the full Transformer process and perform better in point cloud planar projection. The outstanding performance of HorGait offers new insights for the future application of the Transformer architecture in gait recognition.
https://arxiv.org/abs/2410.08454
Gait recognition is a rapidly progressing technique for the remote identification of individuals. Prior research predominantly employing 2D sensors to gather gait data has achieved notable advancements; nonetheless, they have unavoidably neglected the influence of 3D dynamic characteristics on recognition. Gait recognition utilizing LiDAR 3D point clouds not only directly captures 3D spatial features but also diminishes the impact of lighting conditions while ensuring privacy protection.The essence of the problem lies in how to effectively extract discriminative 3D dynamic representation from point this http URL this paper, we proposes a method named SpheriGait for extracting and enhancing dynamic features from point clouds for Lidar-based gait recognition. Specifically, it substitutes the conventional point cloud plane projection method with spherical projection to augment the perception of dynamic feature.Additionally, a network block named DAM-L is proposed to extract gait cues from the projected point cloud data. We conducted extensive experiments and the results demonstrated the SpheriGait achieved state-of-the-art performance on the SUSTech1K dataset, and verified that the spherical projection method can serve as a universal data preprocessing technique to enhance the performance of other LiDAR-based gait recognition methods, exhibiting exceptional flexibility and practicality.
步伐识别是一种快速发展的技术,用于远程识别个体。之前的研究主要采用2D传感器收集步伐数据,取得了显著的进步;然而,他们忽略了3D动态特性对识别的影响。利用激光雷达(LiDAR)的3D点云进行步伐识别不仅直接捕捉到3D空间特征,还确保了隐私保护,同时减轻了光照条件的影响。问题在于如何有效地从本文中的这个http URL提取出有区分性的3D动态表示,我们提出了名为SpheriGait的方法,用于从点云中提取和增强动态特征,用于基于激光雷达(Lidar)的步伐识别。具体来说,它用球形投影替代了传统的点云平面投影方法,以增强动态特征的感知。此外,我们还提出了一个名为DAM-L的网络块,用于从投影点云数据中提取步伐线索。我们进行了广泛的实验,结果表明,SpheriGait在SUSTech1K数据集上取得了最先进的性能,并验证了球形投影方法可以成为一种通用的数据预处理技术,以提高其他基于激光雷达(Lidar)的步伐识别方法的性能,具有出色的灵活性和实用性。
https://arxiv.org/abs/2409.11869
Gait recognition has attracted increasing attention from academia and industry as a human recognition technology from a distance in non-intrusive ways without requiring cooperation. Although advanced methods have achieved impressive success in lab scenarios, most of them perform poorly in the wild. Recently, some Convolution Neural Networks (ConvNets) based methods have been proposed to address the issue of gait recognition in the wild. However, the temporal receptive field obtained by convolution operations is limited for long gait sequences. If directly replacing convolution blocks with visual transformer blocks, the model may not enhance a local temporal receptive field, which is important for covering a complete gait cycle. To address this issue, we design a Global-Local Temporal Receptive Field Network (GLGait). GLGait employs a Global-Local Temporal Module (GLTM) to establish a global-local temporal receptive field, which mainly consists of a Pseudo Global Temporal Self-Attention (PGTA) and a temporal convolution operation. Specifically, PGTA is used to obtain a pseudo global temporal receptive field with less memory and computation complexity compared with a multi-head self-attention (MHSA). The temporal convolution operation is used to enhance the local temporal receptive field. Besides, it can also aggregate pseudo global temporal receptive field to a true holistic temporal receptive field. Furthermore, we also propose a Center-Augmented Triplet Loss (CTL) in GLGait to reduce the intra-class distance and expand the positive samples in the training stage. Extensive experiments show that our method obtains state-of-the-art results on in-the-wild datasets, $i.e.$, Gait3D and GREW. The code is available at this https URL.
基于距离的非侵入性的人脸识别技术引起了学术界和产业界越来越多的关注。尽管在实验室场景中,先进的方法取得了显著的成功,但大多数方法在野外表现不佳。最近,一些基于卷积神经网络(ConvNets)的方法被提出来解决野地中步态识别的问题。然而,卷积操作获得的时域接收域对于较长的步态序列是有限的。如果直接用视觉Transformer块替换卷积模块,模型可能不会增强局部时域接收域,这对于覆盖完整的步态周期的目的很重要。为了解决这个问题,我们设计了一个全局-局部时域接收域网络(GLGait)。GLGait采用一个全局-局部时域模块(GLTM)来建立全局-局部时域接收域,主要是由一个伪全局时域自注意(PGTA)和一个时域卷积操作组成。具体来说,PGTA用于获得比多头自注意(MHSA)更少记忆和计算复杂度的伪全局时域接收域。时域卷积操作用于增强局部时域接收域。此外,它还可以将伪全局时域接收域聚合到真正的全时域接收域中。此外,我们还提出了GLGait中的中心增强三元组损失(CTL),以减少类内距离并扩大训练阶段中的积极样本。大量实验证明,我们的方法在野外数据集(即Gait3D和GREW)上的表现达到了最先进的水平。代码可在此https://url.cn/中获取。
https://arxiv.org/abs/2408.06834
Gait recognition with radio frequency (RF) signals enables many potential applications requiring accurate identification. However, current systems require individuals to be within a line-of-sight (LOS) environment and struggle with low signal-to-noise ratio (SNR) when signals traverse concrete and thick walls. To address these challenges, we present TRGR, a novel transmissive reconfigurable intelligent surface (RIS)-aided gait recognition system. TRGR can recognize human identities through walls using only the magnitude measurements of channel state information (CSI) from a pair of transceivers. Specifically, by leveraging transmissive RIS alongside a configuration alternating optimization algorithm, TRGR enhances wall penetration and signal quality, enabling accurate gait recognition. Furthermore, a residual convolution network (RCNN) is proposed as the backbone network to learn robust human information. Experimental results confirm the efficacy of transmissive RIS, highlighting the significant potential of transmissive RIS in enhancing RF-based gait recognition systems. Extensive experiment results show that TRGR achieves an average accuracy of 97.88\% in identifying persons when signals traverse concrete walls, demonstrating the effectiveness and robustness of TRGR.
利用无线电频率(RF)信号进行步态识别可以实现许多需要准确识别的应用。然而,当前系统要求个人处于可视范围内,并且在信号穿越混凝土和厚墙时,信号与噪声比(SNR)较低。为解决这些挑战,我们提出了TRGR,一种新颖的可重构智能表面(RIS)辅助步态识别系统。TRGR可以通过仅从一对天线测量通道状态信息(CSI)来识别人类身份来识别墙。具体来说,通过在传输RIS和配置交替优化算法的基础上进行可重构,TRGR提高了墙穿能力和信号质量,实现了准确的步态识别。此外,还提出了残差卷积神经网络(RCNN)作为骨干网络来学习稳健的人体信息。实验结果证实了可重构RIS的有效性,突出了在基于RF的步态识别系统中的可重构RIS的显著潜力。大量实验结果表明,TRGR在穿越混凝土墙时识别人员的平均准确率为97.88%,展示了TRGR的有效性和鲁棒性。
https://arxiv.org/abs/2407.21566
Biometric recognition has primarily addressed closed-set identification, assuming all probe subjects are in the gallery. However, most practical applications involve open-set biometrics, where probe subjects may or may not be present in the gallery. This poses distinct challenges in effectively distinguishing individuals in the gallery while minimizing false detections. While it is commonly believed that powerful biometric models can excel in both closed- and open-set scenarios, existing loss functions are inconsistent with open-set evaluation. They treat genuine (mated) and imposter (non-mated) similarity scores symmetrically and neglect the relative magnitudes of imposter scores. To address these issues, we simulate open-set evaluation using minibatches during training and introduce novel loss functions: (1) the identification-detection loss optimized for open-set performance under selective thresholds and (2) relative threshold minimization to reduce the maximum negative score for each probe. Across diverse biometric tasks, including face recognition, gait recognition, and person re-identification, our experiments demonstrate the effectiveness of the proposed loss functions, significantly enhancing open-set performance while positively impacting closed-set performance. Our code and models are available at this https URL.
https://arxiv.org/abs/2407.16133
Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns. Existing appearance-based methods utilize CNN or Transformer to extract spatial and temporal features from silhouettes, while model-based methods employ GCN to focus on the special topological structure of skeleton points. However, the quality of silhouettes is limited by complex occlusions, and skeletons lack dense semantic features of the human body. To tackle these problems, we propose a novel gait recognition framework, dubbed Gait Multi-model Aggregation Network (GaitMA), which effectively combines two modalities to obtain a more robust and comprehensive gait representation for recognition. First, skeletons are represented by joint/limb-based heatmaps, and features from silhouettes and skeletons are respectively extracted using two CNN-based feature extractors. Second, a co-attention alignment module is proposed to align the features by element-wise attention. Finally, we propose a mutual learning module, which achieves feature fusion through cross-attention, Wasserstein loss is further introduced to ensure the effective fusion of two modalities. Extensive experimental results demonstrate the superiority of our model on Gait3D, OU-MVLP, and CASIA-B.
https://arxiv.org/abs/2407.14812
Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, \ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods.
步伐识别是一种生物识别技术,它通过区分个体的行走模式来识别个体。然而, previous 方法在准确提取身份特征时面临挑战,因为它们通常会纠缠于非个体特征。为解决这个挑战,我们提出了 CLTD,一种以因果关系为导向的判别特征学习模块,旨在有效消除在三个领域(即空间、时间和频域)中混淆因素的影响。具体来说,我们利用 Cross Pixel-wise Attention Generator (CPAG) 生成空间和时间域中的事实和反事实特征的注意力分布。然后,我们引入了 Fourier Projection Head (FPH),将空间特征投影到频域中,保留关键信息的同时降低计算成本。此外,我们还使用一种具有对比学习优化的方法来强制确保同一主题序列之间的语义一致性约束。我们的方法在具有挑战性的数据集上取得了显著的性能提升,证明了其有效性。此外,它可以轻松地集成到现有的步伐识别方法中。
https://arxiv.org/abs/2407.12519
Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitable to be represented with dense texture. To enhance the sensitivity to the walking pattern while maintaining the robustness of recognition, we present a Complementary Learning with neural Architecture Search (CLASH) framework, consisting of walking pattern sensitive gait descriptor named dense spatial-temporal field (DSTF) and neural architecture search based complementary learning (NCL). Specifically, DSTF transforms the representation from the sparse binary boundary into the dense distance-based texture, which is sensitive to the walking pattern at the pixel level. Further, NCL presents a task-specific search space for complementary learning, which mutually complements the sensitivity of DSTF and the robustness of the silhouette to represent the walking pattern effectively. Extensive experiments demonstrate the effectiveness of the proposed methods under both in-the-lab and in-the-wild scenarios. On CASIA-B, we achieve rank-1 accuracy of 98.8%, 96.5%, and 89.3% under three conditions. On OU-MVLP, we achieve rank-1 accuracy of 91.9%. Under the latest in-the-wild datasets, we outperform the latest silhouette-based methods by 16.3% and 19.7% on Gait3D and GREW, respectively.
基于轮廓识别,通过识别行走模式来识别个体,已经在很大程度上取得了成功。二进制轮廓序列根据轮廓稀疏表示编码了行走模式。因此,轮廓中的大多数像素对于行走模式都比较不敏感,因为稀疏边界缺乏密集的空间-时间信息,而密集纹理可以表示这种信息。为了在保持识别的鲁棒性的同时提高对行走模式的敏感性,我们提出了一个互补学习与神经网络架构搜索(CLASH)框架。具体来说,DSTF将二进制轮廓表示从稀疏二进制边界转换为密集距离纹理,对像素层面上的行走模式非常敏感。此外,NCL为互补学习任务提供了一个任务特定的搜索空间,有效地互补DSTF的敏感性和轮廓对行走模式的鲁棒性,使其更加有效地表示行走模式。在广泛的实验中,我们证明了所提出方法在实验室和野外场景中的有效性。在CASIA-B上,我们实现了98.8%,96.5%和89.3%的排名1精度。在OU-MVLP上,我们实现了91.9%的排名1精度。在最新的野外数据集上,我们在Gait3D和GREW上分别比基于轮廓的方法提高了16.3%和19.7%的性能。
https://arxiv.org/abs/2407.03632
Gait recognition is a crucial biometric identification technique. Camera-based gait recognition has been widely applied in both research and industrial fields. LiDAR-based gait recognition has also begun to evolve most recently, due to the provision of 3D structural information. However, in certain applications, cameras fail to recognize persons, such as in low-light environments and long-distance recognition scenarios, where LiDARs work well. On the other hand, the deployment cost and complexity of LiDAR systems limit its wider application. Therefore, it is essential to consider cross-modality gait recognition between cameras and LiDARs for a broader range of applications. In this work, we propose the first cross-modality gait recognition framework between Camera and LiDAR, namely CL-Gait. It employs a two-stream network for feature embedding of both modalities. This poses a challenging recognition task due to the inherent matching between 3D and 2D data, exhibiting significant modality discrepancy. To align the feature spaces of the two modalities, i.e., camera silhouettes and LiDAR points, we propose a contrastive pre-training strategy to mitigate modality discrepancy. To make up for the absence of paired camera-LiDAR data for pre-training, we also introduce a strategy for generating data on a large scale. This strategy utilizes monocular depth estimated from single RGB images and virtual cameras to generate pseudo point clouds for contrastive pre-training. Extensive experiments show that the cross-modality gait recognition is very challenging but still contains potential and feasibility with our proposed model and pre-training strategy. To the best of our knowledge, this is the first work to address cross-modality gait recognition.
行走识别是一种关键的生物识别技术。基于相机的行走识别已经在研究和工业领域得到了广泛应用。最近,基于激光雷达的行走识别也开始发展,因为提供了3D结构信息。然而,在某些应用中,相机无法识别人员,例如在低光环境和远距离识别场景中,激光雷达能更好地工作。另一方面,激光雷达系统的部署成本和复杂性限制了其更广泛的应用。因此,在考虑摄像头和激光雷达之间的跨模态行走识别框架时,在更广泛的应用范围内,跨模态行走识别是至关重要的。 在这项工作中,我们提出了第一个基于摄像头和激光雷达之间的跨模态行走识别框架,即CL-Gait。它采用两个流网络对两种模态的特征进行嵌入。由于3D和2D数据之间固有的匹配,这导致识别任务具有挑战性,并表现出显著的模态差异。为了调整两种模态特征空间的对比度,我们提出了一个先验策略来减轻模态差异。为了弥补预训练中缺乏成对相机-激光雷达数据,我们还引入了一种生成大规模数据策略。该策略利用从单色图像和虚拟相机估计的单目深度来生成对比性预训练伪点云。大量实验证明,跨模态行走识别非常具有挑战性,但仍然包含我们提出的模型和预训练策略的潜力和可行性。据我们所知,这是第一个关注跨模态行走识别的工作。
https://arxiv.org/abs/2407.02038
Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this paper, we introduce a novel modality-sensitive network LiCAF for LiDAR-camera fusion, which employs an asymmetric modeling strategy. Specifically, we propose Asymmetric Cross-modal Channel Attention (ACCA) and Interlaced Cross-modal Temporal Modeling (ICTM) for cross-modal valuable channel information selection and powerful temporal modeling. Our method achieves state-of-the-art performance (93.9% in Rank-1 and 98.8% in Rank-5) on the SUSTech1K dataset, demonstrating its effectiveness.
的步伐识别是一种生物识别技术,通过分析行走模式来识别个体。由于步伐识别中多模态融合的显著成就,我们考虑采用激光相机融合来获得稳健的步伐表示。然而,现有的方法通常忽视了模态固有特性,并缺乏细粒度的融合和时间建模。在本文中,我们提出了一个新颖的模态敏感网络LiCAF-LIDAR相机融合,采用了一种非对称建模策略。具体来说,我们提出了非对称跨模态通道关注(ACCA)和交错跨模态时间建模(ICTM)来选择跨模态有价值通道信息,实现强大的时间建模。我们的方法在SUSTech1K数据集上实现了最先进的性能(93.9%在Rank-1和98.8%在Rank-5),证明了其有效性。
https://arxiv.org/abs/2406.12355
Existing deep learning methods have made significant progress in gait recognition. Typically, appearance-based models binarize inputs into silhouette sequences. However, mainstream quantization methods prioritize minimizing task loss over quantization error, which is detrimental to gait recognition with binarized inputs. Minor variations in silhouette sequences can be diminished in the network's intermediate layers due to the accumulation of quantization errors. To address this, we propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation. This enables the network to learn from subtle input perturbations. However, our theoretical analysis and empirical studies reveal that directly applying the soft quantizer can hinder network convergence. We further refine the training strategy to ensure convergence while simulating quantization errors. Additionally, we visualize the distribution of outputs from different samples in the feature space and observe significant changes compared to the full precision network, which harms performance. Based on this, we propose an Inter-class Distance-guided Distillation (IDD) strategy to preserve the relative distance between the embeddings of samples with different labels. Extensive experiments validate the effectiveness of our approach, demonstrating state-of-the-art accuracy across various settings and datasets. The code will be made publicly available.
现有的深度学习方法已经在行走识别方面取得了显著的进展。通常,基于外观的模型会将输入转换为轮廓序列。然而,主流的量化方法优先考虑最小化任务损失,这对使用分割输入的行走识别是有害的。由于量化误差在网络的中间层中累积,轮廓序列中的微小变化可能会减弱。为了应对这个问题,我们提出了一个可导的软量化器,它更好地模拟了在反向传播过程中圆函数的梯度。这使得网络能够从微小的输入扰动中学习。然而,我们的理论分析和实证研究结果表明,直接应用软量化器可能会阻碍网络的收敛。为了确保在模拟量化误差的同时达到收敛,我们进一步优化了训练策略。此外,我们绘制了不同样本在特征空间中的输出分布,并观察到与完整精度网络相比,具有显著的变化。基于这一点,我们提出了一个类间距离指导的蒸馏策略(IDD)来保留不同标签样本的嵌入之间的相对距离。大量实验验证了我们的方法的有效性,证明了在各种设置和数据集上的最先进准确性。代码将公开发布。
https://arxiv.org/abs/2405.13859