Gait recognition is a rapidly progressing technique for the remote identification of individuals. Prior research predominantly employing 2D sensors to gather gait data has achieved notable advancements; nonetheless, they have unavoidably neglected the influence of 3D dynamic characteristics on recognition. Gait recognition utilizing LiDAR 3D point clouds not only directly captures 3D spatial features but also diminishes the impact of lighting conditions while ensuring privacy protection.The essence of the problem lies in how to effectively extract discriminative 3D dynamic representation from point this http URL this paper, we proposes a method named SpheriGait for extracting and enhancing dynamic features from point clouds for Lidar-based gait recognition. Specifically, it substitutes the conventional point cloud plane projection method with spherical projection to augment the perception of dynamic feature.Additionally, a network block named DAM-L is proposed to extract gait cues from the projected point cloud data. We conducted extensive experiments and the results demonstrated the SpheriGait achieved state-of-the-art performance on the SUSTech1K dataset, and verified that the spherical projection method can serve as a universal data preprocessing technique to enhance the performance of other LiDAR-based gait recognition methods, exhibiting exceptional flexibility and practicality.
步伐识别是一种快速发展的技术,用于远程识别个体。之前的研究主要采用2D传感器收集步伐数据,取得了显著的进步;然而,他们忽略了3D动态特性对识别的影响。利用激光雷达(LiDAR)的3D点云进行步伐识别不仅直接捕捉到3D空间特征,还确保了隐私保护,同时减轻了光照条件的影响。问题在于如何有效地从本文中的这个http URL提取出有区分性的3D动态表示,我们提出了名为SpheriGait的方法,用于从点云中提取和增强动态特征,用于基于激光雷达(Lidar)的步伐识别。具体来说,它用球形投影替代了传统的点云平面投影方法,以增强动态特征的感知。此外,我们还提出了一个名为DAM-L的网络块,用于从投影点云数据中提取步伐线索。我们进行了广泛的实验,结果表明,SpheriGait在SUSTech1K数据集上取得了最先进的性能,并验证了球形投影方法可以成为一种通用的数据预处理技术,以提高其他基于激光雷达(Lidar)的步伐识别方法的性能,具有出色的灵活性和实用性。
https://arxiv.org/abs/2409.11869
Gait recognition has attracted increasing attention from academia and industry as a human recognition technology from a distance in non-intrusive ways without requiring cooperation. Although advanced methods have achieved impressive success in lab scenarios, most of them perform poorly in the wild. Recently, some Convolution Neural Networks (ConvNets) based methods have been proposed to address the issue of gait recognition in the wild. However, the temporal receptive field obtained by convolution operations is limited for long gait sequences. If directly replacing convolution blocks with visual transformer blocks, the model may not enhance a local temporal receptive field, which is important for covering a complete gait cycle. To address this issue, we design a Global-Local Temporal Receptive Field Network (GLGait). GLGait employs a Global-Local Temporal Module (GLTM) to establish a global-local temporal receptive field, which mainly consists of a Pseudo Global Temporal Self-Attention (PGTA) and a temporal convolution operation. Specifically, PGTA is used to obtain a pseudo global temporal receptive field with less memory and computation complexity compared with a multi-head self-attention (MHSA). The temporal convolution operation is used to enhance the local temporal receptive field. Besides, it can also aggregate pseudo global temporal receptive field to a true holistic temporal receptive field. Furthermore, we also propose a Center-Augmented Triplet Loss (CTL) in GLGait to reduce the intra-class distance and expand the positive samples in the training stage. Extensive experiments show that our method obtains state-of-the-art results on in-the-wild datasets, $i.e.$, Gait3D and GREW. The code is available at this https URL.
基于距离的非侵入性的人脸识别技术引起了学术界和产业界越来越多的关注。尽管在实验室场景中,先进的方法取得了显著的成功,但大多数方法在野外表现不佳。最近,一些基于卷积神经网络(ConvNets)的方法被提出来解决野地中步态识别的问题。然而,卷积操作获得的时域接收域对于较长的步态序列是有限的。如果直接用视觉Transformer块替换卷积模块,模型可能不会增强局部时域接收域,这对于覆盖完整的步态周期的目的很重要。为了解决这个问题,我们设计了一个全局-局部时域接收域网络(GLGait)。GLGait采用一个全局-局部时域模块(GLTM)来建立全局-局部时域接收域,主要是由一个伪全局时域自注意(PGTA)和一个时域卷积操作组成。具体来说,PGTA用于获得比多头自注意(MHSA)更少记忆和计算复杂度的伪全局时域接收域。时域卷积操作用于增强局部时域接收域。此外,它还可以将伪全局时域接收域聚合到真正的全时域接收域中。此外,我们还提出了GLGait中的中心增强三元组损失(CTL),以减少类内距离并扩大训练阶段中的积极样本。大量实验证明,我们的方法在野外数据集(即Gait3D和GREW)上的表现达到了最先进的水平。代码可在此https://url.cn/中获取。
https://arxiv.org/abs/2408.06834
Gait recognition with radio frequency (RF) signals enables many potential applications requiring accurate identification. However, current systems require individuals to be within a line-of-sight (LOS) environment and struggle with low signal-to-noise ratio (SNR) when signals traverse concrete and thick walls. To address these challenges, we present TRGR, a novel transmissive reconfigurable intelligent surface (RIS)-aided gait recognition system. TRGR can recognize human identities through walls using only the magnitude measurements of channel state information (CSI) from a pair of transceivers. Specifically, by leveraging transmissive RIS alongside a configuration alternating optimization algorithm, TRGR enhances wall penetration and signal quality, enabling accurate gait recognition. Furthermore, a residual convolution network (RCNN) is proposed as the backbone network to learn robust human information. Experimental results confirm the efficacy of transmissive RIS, highlighting the significant potential of transmissive RIS in enhancing RF-based gait recognition systems. Extensive experiment results show that TRGR achieves an average accuracy of 97.88\% in identifying persons when signals traverse concrete walls, demonstrating the effectiveness and robustness of TRGR.
利用无线电频率(RF)信号进行步态识别可以实现许多需要准确识别的应用。然而,当前系统要求个人处于可视范围内,并且在信号穿越混凝土和厚墙时,信号与噪声比(SNR)较低。为解决这些挑战,我们提出了TRGR,一种新颖的可重构智能表面(RIS)辅助步态识别系统。TRGR可以通过仅从一对天线测量通道状态信息(CSI)来识别人类身份来识别墙。具体来说,通过在传输RIS和配置交替优化算法的基础上进行可重构,TRGR提高了墙穿能力和信号质量,实现了准确的步态识别。此外,还提出了残差卷积神经网络(RCNN)作为骨干网络来学习稳健的人体信息。实验结果证实了可重构RIS的有效性,突出了在基于RF的步态识别系统中的可重构RIS的显著潜力。大量实验结果表明,TRGR在穿越混凝土墙时识别人员的平均准确率为97.88%,展示了TRGR的有效性和鲁棒性。
https://arxiv.org/abs/2407.21566
Biometric recognition has primarily addressed closed-set identification, assuming all probe subjects are in the gallery. However, most practical applications involve open-set biometrics, where probe subjects may or may not be present in the gallery. This poses distinct challenges in effectively distinguishing individuals in the gallery while minimizing false detections. While it is commonly believed that powerful biometric models can excel in both closed- and open-set scenarios, existing loss functions are inconsistent with open-set evaluation. They treat genuine (mated) and imposter (non-mated) similarity scores symmetrically and neglect the relative magnitudes of imposter scores. To address these issues, we simulate open-set evaluation using minibatches during training and introduce novel loss functions: (1) the identification-detection loss optimized for open-set performance under selective thresholds and (2) relative threshold minimization to reduce the maximum negative score for each probe. Across diverse biometric tasks, including face recognition, gait recognition, and person re-identification, our experiments demonstrate the effectiveness of the proposed loss functions, significantly enhancing open-set performance while positively impacting closed-set performance. Our code and models are available at this https URL.
https://arxiv.org/abs/2407.16133
Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns. Existing appearance-based methods utilize CNN or Transformer to extract spatial and temporal features from silhouettes, while model-based methods employ GCN to focus on the special topological structure of skeleton points. However, the quality of silhouettes is limited by complex occlusions, and skeletons lack dense semantic features of the human body. To tackle these problems, we propose a novel gait recognition framework, dubbed Gait Multi-model Aggregation Network (GaitMA), which effectively combines two modalities to obtain a more robust and comprehensive gait representation for recognition. First, skeletons are represented by joint/limb-based heatmaps, and features from silhouettes and skeletons are respectively extracted using two CNN-based feature extractors. Second, a co-attention alignment module is proposed to align the features by element-wise attention. Finally, we propose a mutual learning module, which achieves feature fusion through cross-attention, Wasserstein loss is further introduced to ensure the effective fusion of two modalities. Extensive experimental results demonstrate the superiority of our model on Gait3D, OU-MVLP, and CASIA-B.
https://arxiv.org/abs/2407.14812
Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, \ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods.
步伐识别是一种生物识别技术,它通过区分个体的行走模式来识别个体。然而, previous 方法在准确提取身份特征时面临挑战,因为它们通常会纠缠于非个体特征。为解决这个挑战,我们提出了 CLTD,一种以因果关系为导向的判别特征学习模块,旨在有效消除在三个领域(即空间、时间和频域)中混淆因素的影响。具体来说,我们利用 Cross Pixel-wise Attention Generator (CPAG) 生成空间和时间域中的事实和反事实特征的注意力分布。然后,我们引入了 Fourier Projection Head (FPH),将空间特征投影到频域中,保留关键信息的同时降低计算成本。此外,我们还使用一种具有对比学习优化的方法来强制确保同一主题序列之间的语义一致性约束。我们的方法在具有挑战性的数据集上取得了显著的性能提升,证明了其有效性。此外,它可以轻松地集成到现有的步伐识别方法中。
https://arxiv.org/abs/2407.12519
Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitable to be represented with dense texture. To enhance the sensitivity to the walking pattern while maintaining the robustness of recognition, we present a Complementary Learning with neural Architecture Search (CLASH) framework, consisting of walking pattern sensitive gait descriptor named dense spatial-temporal field (DSTF) and neural architecture search based complementary learning (NCL). Specifically, DSTF transforms the representation from the sparse binary boundary into the dense distance-based texture, which is sensitive to the walking pattern at the pixel level. Further, NCL presents a task-specific search space for complementary learning, which mutually complements the sensitivity of DSTF and the robustness of the silhouette to represent the walking pattern effectively. Extensive experiments demonstrate the effectiveness of the proposed methods under both in-the-lab and in-the-wild scenarios. On CASIA-B, we achieve rank-1 accuracy of 98.8%, 96.5%, and 89.3% under three conditions. On OU-MVLP, we achieve rank-1 accuracy of 91.9%. Under the latest in-the-wild datasets, we outperform the latest silhouette-based methods by 16.3% and 19.7% on Gait3D and GREW, respectively.
基于轮廓识别,通过识别行走模式来识别个体,已经在很大程度上取得了成功。二进制轮廓序列根据轮廓稀疏表示编码了行走模式。因此,轮廓中的大多数像素对于行走模式都比较不敏感,因为稀疏边界缺乏密集的空间-时间信息,而密集纹理可以表示这种信息。为了在保持识别的鲁棒性的同时提高对行走模式的敏感性,我们提出了一个互补学习与神经网络架构搜索(CLASH)框架。具体来说,DSTF将二进制轮廓表示从稀疏二进制边界转换为密集距离纹理,对像素层面上的行走模式非常敏感。此外,NCL为互补学习任务提供了一个任务特定的搜索空间,有效地互补DSTF的敏感性和轮廓对行走模式的鲁棒性,使其更加有效地表示行走模式。在广泛的实验中,我们证明了所提出方法在实验室和野外场景中的有效性。在CASIA-B上,我们实现了98.8%,96.5%和89.3%的排名1精度。在OU-MVLP上,我们实现了91.9%的排名1精度。在最新的野外数据集上,我们在Gait3D和GREW上分别比基于轮廓的方法提高了16.3%和19.7%的性能。
https://arxiv.org/abs/2407.03632
Gait recognition is a crucial biometric identification technique. Camera-based gait recognition has been widely applied in both research and industrial fields. LiDAR-based gait recognition has also begun to evolve most recently, due to the provision of 3D structural information. However, in certain applications, cameras fail to recognize persons, such as in low-light environments and long-distance recognition scenarios, where LiDARs work well. On the other hand, the deployment cost and complexity of LiDAR systems limit its wider application. Therefore, it is essential to consider cross-modality gait recognition between cameras and LiDARs for a broader range of applications. In this work, we propose the first cross-modality gait recognition framework between Camera and LiDAR, namely CL-Gait. It employs a two-stream network for feature embedding of both modalities. This poses a challenging recognition task due to the inherent matching between 3D and 2D data, exhibiting significant modality discrepancy. To align the feature spaces of the two modalities, i.e., camera silhouettes and LiDAR points, we propose a contrastive pre-training strategy to mitigate modality discrepancy. To make up for the absence of paired camera-LiDAR data for pre-training, we also introduce a strategy for generating data on a large scale. This strategy utilizes monocular depth estimated from single RGB images and virtual cameras to generate pseudo point clouds for contrastive pre-training. Extensive experiments show that the cross-modality gait recognition is very challenging but still contains potential and feasibility with our proposed model and pre-training strategy. To the best of our knowledge, this is the first work to address cross-modality gait recognition.
行走识别是一种关键的生物识别技术。基于相机的行走识别已经在研究和工业领域得到了广泛应用。最近,基于激光雷达的行走识别也开始发展,因为提供了3D结构信息。然而,在某些应用中,相机无法识别人员,例如在低光环境和远距离识别场景中,激光雷达能更好地工作。另一方面,激光雷达系统的部署成本和复杂性限制了其更广泛的应用。因此,在考虑摄像头和激光雷达之间的跨模态行走识别框架时,在更广泛的应用范围内,跨模态行走识别是至关重要的。 在这项工作中,我们提出了第一个基于摄像头和激光雷达之间的跨模态行走识别框架,即CL-Gait。它采用两个流网络对两种模态的特征进行嵌入。由于3D和2D数据之间固有的匹配,这导致识别任务具有挑战性,并表现出显著的模态差异。为了调整两种模态特征空间的对比度,我们提出了一个先验策略来减轻模态差异。为了弥补预训练中缺乏成对相机-激光雷达数据,我们还引入了一种生成大规模数据策略。该策略利用从单色图像和虚拟相机估计的单目深度来生成对比性预训练伪点云。大量实验证明,跨模态行走识别非常具有挑战性,但仍然包含我们提出的模型和预训练策略的潜力和可行性。据我们所知,这是第一个关注跨模态行走识别的工作。
https://arxiv.org/abs/2407.02038
Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this paper, we introduce a novel modality-sensitive network LiCAF for LiDAR-camera fusion, which employs an asymmetric modeling strategy. Specifically, we propose Asymmetric Cross-modal Channel Attention (ACCA) and Interlaced Cross-modal Temporal Modeling (ICTM) for cross-modal valuable channel information selection and powerful temporal modeling. Our method achieves state-of-the-art performance (93.9% in Rank-1 and 98.8% in Rank-5) on the SUSTech1K dataset, demonstrating its effectiveness.
的步伐识别是一种生物识别技术,通过分析行走模式来识别个体。由于步伐识别中多模态融合的显著成就,我们考虑采用激光相机融合来获得稳健的步伐表示。然而,现有的方法通常忽视了模态固有特性,并缺乏细粒度的融合和时间建模。在本文中,我们提出了一个新颖的模态敏感网络LiCAF-LIDAR相机融合,采用了一种非对称建模策略。具体来说,我们提出了非对称跨模态通道关注(ACCA)和交错跨模态时间建模(ICTM)来选择跨模态有价值通道信息,实现强大的时间建模。我们的方法在SUSTech1K数据集上实现了最先进的性能(93.9%在Rank-1和98.8%在Rank-5),证明了其有效性。
https://arxiv.org/abs/2406.12355
Existing deep learning methods have made significant progress in gait recognition. Typically, appearance-based models binarize inputs into silhouette sequences. However, mainstream quantization methods prioritize minimizing task loss over quantization error, which is detrimental to gait recognition with binarized inputs. Minor variations in silhouette sequences can be diminished in the network's intermediate layers due to the accumulation of quantization errors. To address this, we propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation. This enables the network to learn from subtle input perturbations. However, our theoretical analysis and empirical studies reveal that directly applying the soft quantizer can hinder network convergence. We further refine the training strategy to ensure convergence while simulating quantization errors. Additionally, we visualize the distribution of outputs from different samples in the feature space and observe significant changes compared to the full precision network, which harms performance. Based on this, we propose an Inter-class Distance-guided Distillation (IDD) strategy to preserve the relative distance between the embeddings of samples with different labels. Extensive experiments validate the effectiveness of our approach, demonstrating state-of-the-art accuracy across various settings and datasets. The code will be made publicly available.
现有的深度学习方法已经在行走识别方面取得了显著的进展。通常,基于外观的模型会将输入转换为轮廓序列。然而,主流的量化方法优先考虑最小化任务损失,这对使用分割输入的行走识别是有害的。由于量化误差在网络的中间层中累积,轮廓序列中的微小变化可能会减弱。为了应对这个问题,我们提出了一个可导的软量化器,它更好地模拟了在反向传播过程中圆函数的梯度。这使得网络能够从微小的输入扰动中学习。然而,我们的理论分析和实证研究结果表明,直接应用软量化器可能会阻碍网络的收敛。为了确保在模拟量化误差的同时达到收敛,我们进一步优化了训练策略。此外,我们绘制了不同样本在特征空间中的输出分布,并观察到与完整精度网络相比,具有显著的变化。基于这一点,我们提出了一个类间距离指导的蒸馏策略(IDD)来保留不同标签样本的嵌入之间的相对距离。大量实验验证了我们的方法的有效性,证明了在各种设置和数据集上的最先进准确性。代码将公开发布。
https://arxiv.org/abs/2405.13859
Gait recognition, a rapidly advancing vision technology for person identification from a distance, has made significant strides in indoor settings. However, evidence suggests that existing methods often yield unsatisfactory results when applied to newly released real-world gait datasets. Furthermore, conclusions drawn from indoor gait datasets may not easily generalize to outdoor ones. Therefore, the primary goal of this work is to present a comprehensive benchmark study aimed at improving practicality rather than solely focusing on enhancing performance. To this end, we first develop OpenGait, a flexible and efficient gait recognition platform. Using OpenGait as a foundation, we conduct in-depth ablation experiments to revisit recent developments in gait recognition. Surprisingly, we detect some imperfect parts of certain prior methods thereby resulting in several critical yet undiscovered insights. Inspired by these findings, we develop three structurally simple yet empirically powerful and practically robust baseline models, i.e., DeepGaitV2, SkeletonGait, and SkeletonGait++, respectively representing the appearance-based, model-based, and multi-modal methodology for gait pattern description. Beyond achieving SoTA performances, more importantly, our careful exploration sheds new light on the modeling experience of deep gait models, the representational capacity of typical gait modalities, and so on. We hope this work can inspire further research and application of gait recognition towards better practicality. The code is available at this https URL.
翻译: 滑步识别,一种用于从距离识别人员的人脸识别技术,已经在室内环境中取得了显著的进步。然而,证据表明,将现有的方法应用于刚发布的现实世界滑步数据时,往往会产生不满意的结果。此外,从室内滑步数据中得出的结论可能不容易应用于户外数据。因此,本工作的主要目标是为提高实用性而不是仅仅关注性能。为此,我们首先开发了OpenGait,一个灵活且高效的滑步识别平台。作为OpenGait的基础,我们进行了深入的消融实验,回顾了滑步识别的最新发展。令人惊讶的是,我们检测到某些先前的方法中的一些不完美之处,从而得出了几个关键但尚未被发现的见解。为了这些发现,我们分别开发了三种结构简单但具有实证 powerful 的基准模型,即DeepGaitV2,SkeletonGait和SkeletonGait++,分别代表基于外观、基于模型的和多模态的滑步模式描述方法。除了实现SoTA性能外,更重要的是,我们的仔细探索揭示了深层滑步模型的建模经验、典型滑步模度的表示能力等。我们希望这项工作能够鼓舞进一步研究并应用滑步识别技术以实现更好的实用性。代码可在此处下载:https://url.cn/xyz
https://arxiv.org/abs/2405.09138
Surveillance footage represents a valuable resource and opportunities for conducting gait analysis. However, the typical low quality and high noise levels in such footage can severely impact the accuracy of pose estimation algorithms, which are foundational for reliable gait analysis. Existing literature suggests a direct correlation between the efficacy of pose estimation and the subsequent gait analysis results. A common mitigation strategy involves fine-tuning pose estimation models on noisy data to improve robustness. However, this approach may degrade the downstream model's performance on the original high-quality data, leading to a trade-off that is undesirable in practice. We propose a processing pipeline that incorporates a task-targeted artifact correction model specifically designed to pre-process and enhance surveillance footage before pose estimation. Our artifact correction model is optimized to work alongside a state-of-the-art pose estimation network, HRNet, without requiring repeated fine-tuning of the pose estimation model. Furthermore, we propose a simple and robust method for obtaining low quality videos that are annotated with poses in an automatic manner with the purpose of training the artifact correction model. We systematically evaluate the performance of our artifact correction model against a range of noisy surveillance data and demonstrate that our approach not only achieves improved pose estimation on low-quality surveillance footage, but also preserves the integrity of the pose estimation on high resolution footage. Our experiments show a clear enhancement in gait analysis performance, supporting the viability of the proposed method as a superior alternative to direct fine-tuning strategies. Our contributions pave the way for more reliable gait analysis using surveillance data in real-world applications, regardless of data quality.
监视视频资料是一种宝贵的资源和进行姿态分析的机会。然而,这类视频的低质量和高噪声水平可能会严重影响姿态估计算法的准确性,这些算法是可靠姿态分析的基础。现有文献表明,姿态估计的有效性与后续的姿态分析结果之间存在直接关系。一种常见的缓解策略是在噪声数据上对姿态估计模型进行微调,以提高稳健性。然而,这种方法可能会在原始高质量数据上降低下游模型的性能,导致在实践中不必要的权衡。我们提出了一个处理流程,其中包含一个专门针对任务目标进行预处理和增强的监视视频处理模型。我们的预处理和增强模型与最先进的姿态估计网络——HRNet——协同工作,无需反复微调姿态估计模型。此外,我们提出了一种简单而鲁棒的方法,用于自动标注带有姿态的低质量视频,以训练预处理和增强模型。我们系统地评估了我们的预处理模型的性能,并证明我们的方法不仅能在低质量监视视频上实现 improved pose estimation,还能在高质量视频上保留姿态估计的完整性。我们的实验显示,我们的预处理模型在姿态分析性能上明显增强,支持了所提出的利用监视数据进行更可靠姿态分析作为直接微调策略的替代品。我们的贡献为使用监视数据进行更可靠姿态分析在现实应用中铺平道路,而无需考虑数据质量。
https://arxiv.org/abs/2404.12183
Gait is a behavioral biometric modality that can be used to recognize individuals by the way they walk from a far distance. Most existing gait recognition approaches rely on either silhouettes or skeletons, while their joint use is underexplored. Features from silhouettes and skeletons can provide complementary information for more robust recognition against appearance changes or pose estimation errors. To exploit the benefits of both silhouette and skeleton features, we propose a new gait recognition network, referred to as the GaitPoint+. Our approach models skeleton key points as a 3D point cloud, and employs a computational complexity-conscious 3D point processing approach to extract skeleton features, which are then combined with silhouette features for improved accuracy. Since silhouette- or CNN-based methods already require considerable amount of computational resources, it is preferable that the key point learning module is faster and more lightweight. We present a detailed analysis of the utilization of every human key point after the use of traditional max-pooling, and show that while elbow and ankle points are used most commonly, many useful points are discarded by max-pooling. Thus, we present a method to recycle some of the discarded points by a Recycling Max-Pooling module, during processing of skeleton point clouds, and achieve further performance improvement. We provide a comprehensive set of experimental results showing that (i) incorporating skeleton features obtained by a point-based 3D point cloud processing approach boosts the performance of three different state-of-the-art silhouette- and CNN-based baselines; (ii) recycling the discarded points increases the accuracy further. Ablation studies are also provided to show the effectiveness and contribution of different components of our approach.
步伐是一种行为生物测量方法,可以通过观察一个人从远处走来的方式来识别个体。目前的大多数步伐识别方法依赖于轮廓图或骨骼图,而它们之间的联合应用没有被充分利用。轮廓图和骨骼图的特征可以提供互补信息,以应对外貌变化或姿势估计错误。为了充分利用轮廓图和骨骼图的优势,我们提出了一个新的步伐识别网络,称为GaitPoint+。我们的方法将骨骼关键点建模为3D点云,并采用一种计算复杂性友好的3D点处理方法来提取骨骼特征,然后将这些特征与轮廓图特征相结合以提高准确性。由于轮廓图或CNN方法已经需要相当多的计算资源,因此更快的关键点学习模块和更轻量级的骨架网络更受欢迎。我们对使用传统最大池化方法后每个人体关键点的利用率进行了深入分析,并发现,尽管肘部和足踝关键点最常用,但许多有用的关键点却被最大池化丢弃了。因此,我们提出了一种通过回收被丢弃的关键点来提高骨架点云处理过程性能的方法,并在处理骨架点云的过程中实现进一步的性能提升。我们提供了全面的一组实验结果,表明:(i)通过基于点的方法对3D点云处理技术获得的骨骼特征可以提高三种最先进的轮廓图和CNN基站的性能;(ii)回收被丢弃的关键点可以进一步提高准确性。我们还提供了消融研究,以显示我们方法的不同组件的有效性和贡献。
https://arxiv.org/abs/2404.10213
Current gait recognition research mainly focuses on identifying pedestrians captured by the same type of sensor, neglecting the fact that individuals may be captured by different sensors in order to adapt to various environments. A more practical approach should involve cross-modality matching across different sensors. Hence, this paper focuses on investigating the problem of cross-modality gait recognition, with the objective of accurately identifying pedestrians across diverse vision sensors. We present CrossGait inspired by the feature alignment strategy, capable of cross retrieving diverse data modalities. Specifically, we investigate the cross-modality recognition task by initially extracting features within each modality and subsequently aligning these features across modalities. To further enhance the cross-modality performance, we propose a Prototypical Modality-shared Attention Module that learns modality-shared features from two modality-specific features. Additionally, we design a Cross-modality Feature Adapter that transforms the learned modality-specific features into a unified feature space. Extensive experiments conducted on the SUSTech1K dataset demonstrate the effectiveness of CrossGait: (1) it exhibits promising cross-modality ability in retrieving pedestrians across various modalities from different sensors in diverse scenes, and (2) CrossGait not only learns modality-shared features for cross-modality gait recognition but also maintains modality-specific features for single-modality recognition.
当前的步态识别研究主要集中在识别由相同类型传感器捕捉的行人,忽视了个人可能被不同类型的传感器捕捉的事实,以适应各种环境。更实际的方法应该涉及不同传感器之间的跨模态匹配。因此,本文重点研究了跨模态步行识别问题,以准确识别不同视觉传感器捕捉的行人。我们提出了CrossGait,这是一种基于特征对齐策略的步行识别方法,具有跨检索不同数据模态的能力。具体来说,我们研究了跨模态识别任务,首先在每种模态中提取特征,然后在这些特征之间进行对齐。为了进一步提高跨模态性能,我们提出了一个原型模态共享注意模块,从两个模态特定的特征中学习模态共享特征。此外,我们还设计了一个Cross-模态特征适配器,将学习到的模态特定特征转换为统一特征空间。在SUSTech1K数据集上进行的大量实验证明CrossGait的有效性:(1)它表现出在不同场景的多样传感器中检索行人具有希望的跨模态能力;(2)CrossGait不仅学习跨模态步行识别的模态共享特征,还保留单模态识别的模态特定特征。
https://arxiv.org/abs/2404.04120
Gait recognition aims to identify a person based on their walking sequences, serving as a useful biometric modality as it can be observed from long distances without requiring cooperation from the subject. In representing a person's walking sequence, silhouettes and skeletons are the two primary modalities used. Silhouette sequences lack detailed part information when overlapping occurs between different body segments and are affected by carried objects and clothing. Skeletons, comprising joints and bones connecting the joints, provide more accurate part information for different segments; however, they are sensitive to occlusions and low-quality images, causing inconsistencies in frame-wise results within a sequence. In this paper, we explore the use of a two-stream representation of skeletons for gait recognition, alongside silhouettes. By fusing the combined data of silhouettes and skeletons, we refine the two-stream skeletons, joints, and bones through self-correction in graph convolution, along with cross-modal correction with temporal consistency from silhouettes. We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.
翻译: 步态识别的目的是根据一个人的行走序列来识别这个人,作为一个有用的生物测量指标,因为它可以从很远的距离上通过合作观察到,而不需要对被测者进行合作。在表示一个人的行走序列时,轮廓和骨架是两种主要的模式。轮廓序列在多个身体部位之间发生重叠时缺乏详细的部分信息,并受到携带物品和服装的影响。骨架,由连接关节的关节和骨头组成,提供不同部位更准确的部分信息;然而,它们对遮挡和低质量图像敏感,导致序列中的帧结果不一致。在本文中,我们探讨了使用骨架的双流表示方法进行步态识别,同时使用轮廓。通过将轮廓和骨架的合并数据进行融合,我们通过自校正的图卷积和对时一致的跨模态校正在轮廓和骨架上进行优化。我们证明了,通过优化骨架,步态识别模型的性能可以在没有额外注释的公共步态识别数据集上实现比最先进方法更进一步的改进。
https://arxiv.org/abs/2404.02345
Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to develop highly efficient low-power models that can be deployed on to small form-factor devices such as microcontrollers. In this paper, we propose a small CNN model with 4 layers that is very amenable for edge AI deployment and realtime gait recognition. This model was trained on a public gait dataset with 20 classes augmented with data collected by the authors, aggregating to 24 classes in total. Our model achieves 96.7% accuracy and consumes only 5KB RAM with an inferencing time of 70 ms and 125mW power, while running continuous inference on Arduino Nano 33 BLE Sense. We successfully demonstrated realtime identification of the authors with the model running on Arduino, thus underscoring the efficacy and providing a proof of feasiblity for deployment in practical systems in near future.
每个人的步态都是独特的,也就是行走方式,可以作为个人识别的生物特征。最近的工作已经证明了使用深度神经网络有效识别人类步态,然而,大多数这些工作主要关注分类精度而不是模型效率。要在边缘使用可穿戴设备进行步态识别,就必须开发出能够在小尺寸设备上部署的高效低功耗模型。在本文中,我们提出了一个4层的紧凑型CNN模型,对边缘AI部署非常具有亲和力,并且可以实现实时步态识别。这个模型在由作者收集的20个类别的公开步态数据集上进行训练,累积到24个类别。我们的模型实现96.7%的准确率,并且在使用Arduino Nano 33 BLE Sense进行推理时,仅消耗5KB的RAM,推理时间为70ms,功率为125mW。通过在Arduino上运行我们的模型,我们成功实现了作者的实时识别,从而突出了其有效性和为即将到来的实际系统提供可行性的证明。
https://arxiv.org/abs/2404.15312
In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin with the observation that Relative Position Encoding (RPE) is a good way to bring affine transform generalization to ViTs. RPE, however, can only inject the model with prior knowledge that nearby pixels are more important than far pixels. Keypoint RPE (KP-RPE) is an extension of this principle, where the significance of pixels is not solely dictated by their proximity but also by their relative positions to specific keypoints within the image. By anchoring the significance of pixels around keypoints, the model can more effectively retain spatial relationships, even when those relationships are disrupted by affine transformations. We show the merit of KP-RPE in face and gait recognition. The experimental results demonstrate the effectiveness in improving face recognition performance from low-quality images, particularly where alignment is prone to failure. Code and pre-trained models are available.
在本文中,我们解决了使ViT模型对未见到的平移变换更加鲁棒的问题。这种鲁棒性在各种识别任务中变得有用,例如在面部识别中,当图像对齐失败时。我们提出了一种名为KP-RPE的新方法,它利用关键点(例如~面部关键点)使ViT更加弹性,对平移和姿态变化具有鲁棒性。我们首先观察到,相对位置编码(RPE)是将平移变换推广到ViT的好的方法。然而,RPE只能将模型注入到附近像素比远距离像素更重要这一先验知识。关键点RPE(KP-RPE)是这一原则的扩展,其中像素的重要性不仅由其邻近性决定,还由其与图像中特定关键点之间的相对位置决定。通过将像素的重要性锚定在关键点上,模型可以在平移变换破坏时更有效地保留空间关系。我们在面部和步态识别中展示了KP-RPE的优点。实验结果表明,从低质量图像中提高面部识别性能的有效方法,特别是在对齐容易失败的地方。代码和预训练模型可获得。
https://arxiv.org/abs/2403.14852
Gait recognition stands as one of the most pivotal remote identification technologies and progressively expands across research and industrial communities. However, existing gait recognition methods heavily rely on task-specific upstream driven by supervised learning to provide explicit gait representations, which inevitably introduce expensive annotation costs and potentially cause cumulative errors. Escaping from this trend, this work explores effective gait representations based on the all-purpose knowledge produced by task-agnostic Large Vision Models (LVMs) and proposes a simple yet efficient gait framework, termed BigGait. Specifically, the Gait Representation Extractor (GRE) in BigGait effectively transforms all-purpose knowledge into implicit gait features in an unsupervised manner, drawing from design principles of established gait representation construction approaches. Experimental results on CCPG, CAISA-B* and SUSTech1K indicate that BigGait significantly outperforms the previous methods in both self-domain and cross-domain tasks in most cases, and provides a more practical paradigm for learning the next-generation gait representation. Eventually, we delve into prospective challenges and promising directions in LVMs-based gait recognition, aiming to inspire future work in this emerging topic. The source code will be available at this https URL.
翻译: Gait识别是远程识别技术中最关键的一个,并随着研究和工业社区的不断发展而扩展。然而,现有的gait识别方法在很大程度上依赖于任务特定的上游驱动的监督学习来提供明确的gait表示,这无疑会带来昂贵的注释成本,并可能导致累积错误。为了摆脱这一趋势,本文探讨了基于任务无关的大型视觉模型(LVMs)产生的全知论的有效gait表示,并提出了一个简单而有效的gait框架,称为BigGait。具体来说,BigGait中的Gait表示提取器(GRE)以非监督方式将全知论转换为隐含的gait特征,并从现有gait表示构建方法的设计原则中汲取了设计原则。在CCPG、CAISA-B*和SUSTech1K等实验中,BigGait在自领域和跨领域任务中的表现明显优于以前的方法,并为学习下一代的gait表示提供了一个更实际的范例。最后,我们深入探讨了基于LVMs的gait识别中的潜在挑战和有前景的 direction,旨在激发未来在这个新兴领域的研究工作。源代码将在此链接中提供。
https://arxiv.org/abs/2402.19122
Gait, an unobtrusive biometric, is valued for its capability to identify individuals at a distance, across external outfits and environmental conditions. This study challenges the prevailing assumption that vision-based gait recognition, in particular skeleton-based gait recognition, relies primarily on motion patterns, revealing a significant role of the implicit anthropometric information encoded in the walking sequence. We show through a comparative analysis that removing height information leads to notable performance degradation across three models and two benchmarks (CASIA-B and GREW). Furthermore, we propose a spatial transformer model processing individual poses, disregarding any temporal information, which achieves unreasonably good accuracy, emphasizing the bias towards appearance information and indicating spurious correlations in existing benchmarks. These findings underscore the need for a nuanced understanding of the interplay between motion and appearance in vision-based gait recognition, prompting a reevaluation of the methodological assumptions in this field. Our experiments indicate that "in-the-wild" datasets are less prone to spurious correlations, prompting the need for more diverse and large scale datasets for advancing the field.
翻译:Gait,一种不显眼的生物识别技术,因其能够在距离、外部服装和环境条件下识别个体的能力而受到重视。这项研究挑战了普遍认为,视觉为基础的步态识别,特别是基于骨骼的步态识别,主要依赖于运动模式,揭示了在步行序列中编码的隐含人体测量信息的重要作用。我们通过比较分析展示了,去除身高信息会导致三种模型和两个基准(CASIA-B和GREW)的性能显著下降。此外,我们提出了一个忽略任何时间信息的空间转换器模型来处理个人动作,实现了前所未有的准确性,强调了面向外观信息的偏差,并指出了现有基准中的伪相关关系。这些发现强调了在视觉为基础的步态识别中,需要对运动和外观之间的相互作用进行深入的理解,这促使我们在该领域重新评估方法论假设。我们的实验表明,“野外”数据集不太容易受到伪相关关系的影响,因此需要更大、更多样化的数据集来推动该领域的进步。
https://arxiv.org/abs/2402.08320
Gait recognition is a promising biometric method that aims to identify pedestrians from their unique walking patterns. Silhouette modality, renowned for its easy acquisition, simple structure, sparse representation, and convenient modeling, has been widely employed in controlled in-the-lab research. However, as gait recognition rapidly advances from in-the-lab to in-the-wild scenarios, various conditions raise significant challenges for silhouette modality, including 1) unidentifiable low-quality silhouettes (abnormal segmentation, severe occlusion, or even non-human shape), and 2) identifiable but challenging silhouettes (background noise, non-standard posture, slight occlusion). To address these challenges, we revisit gait recognition pipeline and approach gait recognition from a quality perspective, namely QAGait. Specifically, we propose a series of cost-effective quality assessment strategies, including Maxmial Connect Area and Template Match to eliminate background noises and unidentifiable silhouettes, Alignment strategy to handle non-standard postures. We also propose two quality-aware loss functions to integrate silhouette quality into optimization within the embedding space. Extensive experiments demonstrate our QAGait can guarantee both gait reliability and performance enhancement. Furthermore, our quality assessment strategies can seamlessly integrate with existing gait datasets, showcasing our superiority. Code is available at this https URL.
行走识别是一种有前景的生物识别方法,旨在通过独特的行走模式识别行人。轮廓表示形式以其易得性、简单的结构、稀疏表示和方便建模而闻名,在实验室控制研究中被广泛应用。然而,随着行走识别从实验室环境迅速转移到现实环境,轮廓表示形式面临着一系列具有挑战性的条件,包括1)无法识别的低质量轮廓(异常分割、严重遮挡或甚至非人类形状),2)可以识别但具有挑战性的轮廓(背景噪声、不标准姿势、轻微遮挡)。为了应对这些挑战,我们重新审视了行走识别流程,并从质量角度出发进行行走识别,即QAGait。具体来说,我们提出了一系列具有成本效益的质评估策略,包括Maxmial Connect Area和模板匹配以消除背景噪声和无法识别的轮廓,以及Alignment策略来处理不标准的姿势。我们还提出了两个质量感知的损失函数,将轮廓质量整合到嵌入空间中的优化。大量实验证明,我们的QAGait可以确保行走的可靠性和性能提升。此外,我们的质评估策略可以无缝地整合到现有的行走数据集中,展示出我们优越的质量。代码可在此处访问:https://www. this URL。
https://arxiv.org/abs/2401.13531