Identifying humans with their walking sequences, known as gait recognition, is a useful biometric understanding task as it can be observed from a long distance and does not require cooperation from the subject. Two common modalities used for representing the walking sequence of a person are silhouettes and joint skeletons. Silhouette sequences, which record the boundary of the walking person in each frame, may suffer from the variant appearances from carried-on objects and clothes of the person. Framewise joint detections are noisy and introduce some jitters that are not consistent with sequential detections. In this paper, we combine the silhouettes and skeletons and refine the framewise joint predictions for gait recognition. With temporal information from the silhouette sequences. We show that the refined skeletons can improve gait recognition performance without extra annotations. We compare our methods on four public datasets, CASIA-B, OUMVLP, Gait3D and GREW, and show state-of-the-art performance.
识别人类的步态序列,也称为步态识别,是一项有用的生物特征理解任务,因为它可以在远处观察并不需要 subject 的合作。用于表示一个人步态序列的常见模式有两种:轮廓和关节骨骼。轮廓序列在每个帧中记录步态人的边界,可能会受到携带物品和衣服的人所穿衣服的变化影响。帧间关节检测是噪声性的,并引入了与顺序检测不一致的一些抖动。在本文中,我们将轮廓和骨骼组合在一起,并优化帧间关节预测,以进行步态识别。利用轮廓序列的时序信息,我们表明,优化的骨骼可以在不需要额外的注释的情况下提高步态识别性能。我们比较了四个公共数据集:CASIA-B、OUMVLP、Gait3D 和 GREW,并展示了最先进的性能。
https://arxiv.org/abs/2304.07916
Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns. Compared with other biometric technologies, gait recognition is more difficult to disguise and can be applied to the condition of long-distance without the cooperation of subjects. Thus, it has unique potential and wide application for crime prevention and social security. At present, most gait recognition methods directly extract features from the video frames to establish representations. However, these architectures learn representations from different features equally but do not pay enough attention to dynamic features, which refers to a representation of dynamic parts of silhouettes over time (e.g. legs). Since dynamic parts of the human body are more informative than other parts (e.g. bags) during walking, in this paper, we propose a novel and high-performance framework named DyGait. This is the first framework on gait recognition that is designed to focus on the extraction of dynamic features. Specifically, to take full advantage of the dynamic information, we propose a Dynamic Augmentation Module (DAM), which can automatically establish spatial-temporal feature representations of the dynamic parts of the human body. The experimental results show that our DyGait network outperforms other state-of-the-art gait recognition methods. It achieves an average Rank-1 accuracy of 71.4% on the GREW dataset, 66.3% on the Gait3D dataset, 98.4% on the CASIA-B dataset and 98.3% on the OU-MVLP dataset.
步态识别是一种基于生物特征的技术,通过识别人类的步行模式来识别身份。与其他生物特征技术相比,步态识别更难被伪装,可以在没有 subjects 合作的情况下应用于长距离情况。因此,它具有独特的潜力,广泛适用于预防犯罪和社会保障。目前,大多数步态识别方法直接从视频帧中提取特征,以建立表示。但是,这些架构只从不同特征中学习表示,而忽略了动态特征,动态特征是指 silhouette 的动态部分随着时间的推移的表示。由于人体的动态部分在步行时比其他部位(如袋子)更具有信息,因此在本文中,我们提出了一种名为 DyGait 的新高性能框架,它是第一个专注于动态特征提取的步态识别框架。具体来说,为了充分利用动态信息,我们提出了动态增强模块(DAM),它可以自动建立人体动态部分的时间和空间特征表示。实验结果显示,我们的 DyGait 网络在其他先进的步态识别方法中表现优异。它在 GREW 数据集上的平均排名准确率为 71.4%,在 Gait3D 数据集上为 66.3%,在 CASIA-B 数据集上为 98.4%,在 OU-MVLP 数据集上为 98.3%。
https://arxiv.org/abs/2303.14953
Previous gait recognition methods primarily trained on labeled datasets, which require painful labeling effort. However, using a pre-trained model on a new dataset without fine-tuning can lead to significant performance degradation. So to make the pre-trained gait recognition model able to be fine-tuned on unlabeled datasets, we propose a new task: Unsupervised Gait Recognition (UGR). We introduce a new cluster-based baseline to solve UGR with cluster-level contrastive learning. But we further find more challenges this task meets. First, sequences of the same person in different clothes tend to cluster separately due to the significant appearance changes. Second, sequences taken from 0 and 180 views lack walking postures and do not cluster with sequences taken from other views. To address these challenges, we propose a Selective Fusion method, which includes Selective Cluster Fusion (SCF) and Selective Sample Fusion (SSF). With SCF, we merge matched clusters of the same person wearing different clothes by updating the cluster-level memory bank with a multi-cluster update strategy. And in SSF, we merge sequences taken from front/back views gradually with curriculum learning. Extensive experiments show the effectiveness of our method in improving the rank-1 accuracy in walking with different coats condition and front/back views conditions.
以前的步态识别方法主要训练在标记数据集上,这需要痛苦的标记工作。然而,在没有进行 fine-tuning 的情况下使用预训练模型在一个未标记的数据集上可能会导致显著的性能下降。因此,我们提出了一个新的任务:未监督步态识别(UGR)。我们提出了一个新的基于簇的基准来解决 UGR,并使用簇级别的对比学习来解决它。但是,我们还发现这个任务面临着更多的挑战。首先,不同服装中的同一个人在不同视图下往往会形成独立的簇,因为外观变化很大。其次,从 0 和 180 视图中获取的序列缺乏行走姿势,并与从其他视图获取的序列不形成簇。为了解决这些挑战,我们提出了一种选择性融合方法,包括选择性簇融合(SCF)和选择性样本融合(SSF)。通过 SCF,我们使用多簇更新策略更新簇级别的记忆库,以更新簇级别的内存银行。在 SSF 中,我们逐渐合并从前后视图获取的序列。广泛的实验表明,我们的方法在穿着不同服装和前后视图条件下行走时提高第 1 名准确性的有效性。
https://arxiv.org/abs/2303.10772
Recent works on pose-based gait recognition have demonstrated the potential of using such simple information to achieve results comparable to silhouette-based methods. However, the generalization ability of pose-based methods on different datasets is undesirably inferior to that of silhouette-based ones, which has received little attention but hinders the application of these methods in real-world scenarios. To improve the generalization ability of pose-based methods across datasets, we propose a Generalized Pose-based Gait recognition (GPGait) framework. First, a Human-Oriented Transformation (HOT) and a series of Human-Oriented Descriptors (HOD) are proposed to obtain a unified pose representation with discriminative multi-features. Then, given the slight variations in the unified representation after HOT and HOD, it becomes crucial for the network to extract local-global relationships between the keypoints. To this end, a Part-Aware Graph Convolutional Network (PAGCN) is proposed to enable efficient graph partition and local-global spatial feature extraction. Experiments on four public gait recognition datasets, CASIA-B, OUMVLP-Pose, Gait3D and GREW, show that our model demonstrates better and more stable cross-domain capabilities compared to existing skeleton-based methods, achieving comparable recognition results to silhouette-based ones. The code will be released.
近年来,基于姿势的步态识别研究已经表明,使用这种简单的信息可以达到与轮廓方法类似的结果。然而,不同数据集上基于姿势的方法的泛化能力却比基于轮廓的方法差,这虽然未被广泛关注,但却限制了这些方法在现实世界场景中的应用。为了提高基于姿势的方法在不同数据集上的泛化能力,我们提出了一个基于姿势的通用步态识别框架(GPGait)。首先,我们提出了一个面向人的变换(hot)和一个面向人的描述符(HOD),以获得具有区分性的多特征的统一姿势表示。然后,在hot和HOD之后的统一表示略有变化,因此,网络必须提取关键点之间的局部global关系。为此,我们提出了一种部分 aware Graph Convolutional Network(PAGCN),以高效地划分 Graph 并提取局部global空间特征。在四个公开步态识别数据集上进行了实验,包括 CASIA-B、OUMVLP-Pose、 Gait3D 和 GREW,结果显示,我们的模型相比现有的骨骼结构方法表现出更好的和更稳定的跨域能力,达到了与轮廓方法类似的识别结果。代码将发布。
https://arxiv.org/abs/2303.05234
Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively small and shallow neural networks to extract subtle gait features, achieving impressive successes in indoor settings. Nevertheless, experiments revealed that these existing methods mostly produce unsatisfactory results when applied to newly released in-the-wild gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Consequently, we emphasize the importance of suitable network capacity, explicit temporal modeling, and deep transformer structure for discriminative gait representation learning. Our proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance gains in outdoor scenarios, \textit{e.g.}, about +30\% rank-1 accuracy compared with many state-of-the-art methods on the challenging GREW dataset. This work is expected to further boost the research and application of gait recognition. Code will be available at this https URL.
步态识别是一种快速发展的 vision 技术,用于从远处识别人员。先前的研究主要使用相对较小且较浅的神经网络提取 subtle 步态特征,在室内 settings 中取得了令人印象深刻的成功。然而,实验表明,这些现有方法在对新近发布的野生步态数据集的应用中往往表现不佳。本文提出了一种统一的视角,以探索如何构建先进的室外步态识别 deep 模型,包括经典的卷积神经网络和新兴的Transformer 架构。因此,我们强调适当的网络能力、明确的时序建模和深层Transformer结构对于精细步态表示学习的重要性。我们提出的基于卷积神经网络的 DeepGaitV2 系列和基于Transformer 的 SwinGait 系列在室外场景中表现出显著的性能提升,例如,与许多在挑战性的 GREW 数据集上领先的methods 相比,我们的方法和模型在野生步态数据集上取得了约30%的准确率提升。这项工作有望进一步促进步态识别的研究和应用。代码将在此 https URL 上提供。
https://arxiv.org/abs/2303.03301
mmWave radar-based gait recognition is a novel user identification method that captures human gait biometrics from mmWave radar return signals. This technology offers privacy protection and is resilient to weather and lighting conditions. However, its generalization performance is yet unknown and limits its practical deployment. To address this problem, in this paper, a non-synthetic dataset is collected and analyzed to reveal the presence of spatial and temporal domain shifts in mmWave gait biometric data, which significantly impacts identification accuracy. To address this issue, a novel self-aligned domain adaptation method called GaitSADA is proposed. GaitSADA improves system generalization performance by using a two-stage semi-supervised model training approach. The first stage uses semi-supervised contrastive learning and the second stage uses semi-supervised consistency training with centroid alignment. Extensive experiments show that GaitSADA outperforms representative domain adaptation methods by an average of 15.41% in low data regimes.
毫米波雷达基于步进识别是一种新的用户身份识别方法,从毫米波雷达返回信号中捕获人类步进生物特征。该技术提供隐私保护,并对天气和照明条件具有鲁棒性。然而,其泛化性能仍未得到确定,限制了其实际应用。为了解决这一问题,本文收集和分析了一个非合成数据集,以揭示毫米波步进生物特征数据的空间和时间域变化,这些变化显著影响了身份识别准确性。为了解决这一问题,我们提出了一种名为GaitSADA的新自 align 的域适应方法。GaitSADA使用两个阶段的半监督模型训练方法来提高系统的泛化性能。第一阶段使用半监督对比度学习,第二阶段使用半监督一致性训练,并使用中心对齐。广泛的实验表明,GaitSADA在低数据状态下平均领先代表性域适应方法15.41%。
https://arxiv.org/abs/2301.13384
Gait recognition, which identifies individuals based on their walking patterns, is an important biometric technique since it can be observed from a distance and does not require the subject's cooperation. Recognizing a person's gait is difficult because of the appearance variants in human silhouette sequences produced by varying viewing angles, carrying objects, and clothing. Recent research has produced a number of ways for coping with these variants. In this paper, we present the usage of inferring 3-D body shapes distilled from limited images, which are, in principle, invariant to the specified variants. Inference of 3-D shape is a difficult task, especially when only silhouettes are provided in a dataset. We provide a method for learning 3-D body inference from silhouettes by transferring knowledge from 3-D shape prior from RGB photos. We use our method on multiple existing state-of-the-art gait baselines and obtain consistent improvements for gait identification on two public datasets, CASIA-B and OUMVLP, on several variants and settings, including a new setting of novel views not seen during training.
https://arxiv.org/abs/2212.09042
Gait recognition from motion capture data, as a pattern classification discipline, can be improved by the use of machine learning. This paper contributes to the state-of-the-art with a statistical approach for extracting robust gait features directly from raw data by a modification of Linear Discriminant Analysis with Maximum Margin Criterion. Experiments on the CMU MoCap database show that the suggested method outperforms thirteen relevant methods based on geometric features and a method to learn the features by a combination of Principal Component Analysis and Linear Discriminant Analysis. The methods are evaluated in terms of the distribution of biometric templates in respective feature spaces expressed in a number of class separability coefficients and classification metrics. Results also indicate a high portability of learned features, that means, we can learn what aspects of walk people generally differ in and extract those as general gait features. Recognizing people without needing group-specific features is convenient as particular people might not always provide annotated learning data. As a contribution to reproducible research, our evaluation framework and database have been made publicly available. This research makes motion capture technology directly applicable for human recognition.
https://arxiv.org/abs/1708.07755
As a contribution to reproducible research, this paper presents a framework and a database to improve the development, evaluation and comparison of methods for gait recognition from motion capture (MoCap) data. The evaluation framework provides implementation details and source codes of state-of-the-art human-interpretable geometric features as well as our own approaches where gait features are learned by a modification of Fisher's Linear Discriminant Analysis with the Maximum Margin Criterion, and by a combination of Principal Component Analysis and Linear Discriminant Analysis. It includes a description and source codes of a mechanism for evaluating four class separability coefficients of feature space and four rank-based classifier performance metrics. This framework also contains a tool for learning a custom classifier and for classifying a custom query on a custom gallery. We provide an experimental database along with source codes for its extraction from the general CMU MoCap database.
https://arxiv.org/abs/1701.00995
MoCap-based human identification, as a pattern recognition discipline, can be optimized using a machine learning approach. Yet in some applications such as video surveillance new identities can appear on the fly and labeled data for all encountered people may not always be available. This work introduces the concept of learning walker-independent gait features directly from raw joint coordinates by a modification of the Fisher Linear Discriminant Analysis with Maximum Margin Criterion. Our new approach shows not only that these features can discriminate different people than who they are learned on, but also that the number of learning identities can be much smaller than the number of walkers encountered in the real operation.
https://arxiv.org/abs/1609.06936
In the field of gait recognition from motion capture data, designing human-interpretable gait features is a common practice of many fellow researchers. To refrain from ad-hoc schemes and to find maximally discriminative features we may need to explore beyond the limits of human interpretability. This paper contributes to the state-of-the-art with a machine learning approach for extracting robust gait features directly from raw joint coordinates. The features are learned by a modification of Linear Discriminant Analysis with Maximum Margin Criterion so that the identities are maximally separated and, in combination with an appropriate classifier, used for gait recognition. Experiments on the CMU MoCap database show that this method outperforms eight other relevant methods in terms of the distribution of biometric templates in respective feature spaces expressed in four class separability coefficients. Additional experiments indicate that this method is a leading concept for rank-based classifier systems.
https://arxiv.org/abs/1609.04392
LiDAR can capture accurate depth information in large-scale scenarios without the effect of light conditions, and the captured point cloud contains gait-related 3D geometric properties and dynamic motion characteristics. We make the first attempt to leverage LiDAR to remedy the limitation of view-dependent and light-sensitive camera for more robust and accurate gait recognition. In this paper, we propose a LiDAR-camera-based gait recognition method with an effective multi-modal feature fusion strategy, which fully exploits advantages of both point clouds and images. In particular, we propose a new in-the-wild gait dataset, LiCamGait, involving multi-modal visual data and diverse 2D/3D representations. Our method achieves state-of-the-art performance on the new dataset. Code and dataset will be released when this paper is published.
https://arxiv.org/abs/2211.12371
Gait recognition is an important AI task, which has been progressed rapidly with the development of deep learning. However, existing learning based gait recognition methods mainly focus on the single domain, especially the constrained laboratory environment. In this paper, we study a new problem of unsupervised domain adaptive gait recognition (UDA-GR), that learns a gait identifier with supervised labels from the indoor scenes (source domain), and is applied to the outdoor wild scenes (target domain). For this purpose, we develop an uncertainty estimation and regularization based UDA-GR method. Specifically, we investigate the characteristic of gaits in the indoor and outdoor scenes, for estimating the gait sample uncertainty, which is used in the unsupervised fine-tuning on the target domain to alleviate the noises of the pseudo labels. We also establish a new benchmark for the proposed problem, experimental results on which show the effectiveness of the proposed method. We will release the benchmark and source code in this work to the public.
https://arxiv.org/abs/2211.11155
Video-based gait recognition has achieved impressive results in constrained scenarios. However, visual cameras neglect human 3D structure information, which limits the feasibility of gait recognition in the 3D wild world. In this work, instead of extracting gait features from images, we explore precise 3D gait features from point clouds and propose a simple yet efficient 3D gait recognition framework, termed multi-view projection network (MVPNet). MVPNet first projects point clouds into multiple depth maps from different perspectives, and then fuse depth images together, to learn the compact representation with 3D geometry information. Due to the lack of point cloud datasets, we build the first large-scale Lidar-based gait recognition dataset, LIDAR GAIT, collected by a Lidar sensor and an RGB camera mounted on a robot. The dataset contains 25,279 sequences from 1,050 subjects and covers many different variations, including visibility, views, occlusions, clothing, carrying, and scenes. Extensive experiments show that, (1) 3D structure information serves as a significant feature for gait recognition. (2) MVPNet not only competes with five representative point-based methods, but it also outperforms existing camera-based methods by large margins. (3) The Lidar sensor is superior to the RGB camera for gait recognition in the wild. LIDAR GAIT dataset and MVPNet code will be publicly available.
https://arxiv.org/abs/2211.10598
Existing gait recognition frameworks retrieve an identity in the gallery based on the distance between a probe sample and the identities in the gallery. However, existing methods often neglect that the gallery may not contain identities corresponding to the probes, leading to recognition errors rather than raising an alarm. In this paper, we introduce a novel uncertainty-aware gait recognition method that models the uncertainty of identification based on learned evidence. Specifically, we treat our recognition model as an evidence collector to gather evidence from input samples and parameterize a Dirichlet distribution over the evidence. The Dirichlet distribution essentially represents the density of the probability assigned to the input samples. We utilize the distribution to evaluate the resultant uncertainty of each probe sample and then determine whether a probe has a counterpart in the gallery or not. To the best of our knowledge, our method is the first attempt to tackle gait recognition with uncertainty modelling. Moreover, our uncertain modeling significantly improves the robustness against out-of-distribution (OOD) queries. Extensive experiments demonstrate that our method achieves state-of-the-art performance on datasets with OOD queries, and can also generalize well to other identity-retrieval tasks. Importantly, our method outperforms the state-of-the-art by a large margin of 44.19% when the OOD query rate is around 50% on OUMVLP.
https://arxiv.org/abs/2211.08007
Gait recognition is one of the most important long-distance identification technologies and increasingly gains popularity in both research and industry communities. Although significant progress has been made in indoor datasets, much evidence shows that gait recognition techniques perform poorly in the wild. More importantly, we also find that many conclusions from prior works change with the evaluation datasets. Therefore, the more critical goal of this paper is to present a comprehensive benchmark study for better practicality rather than only a particular model for better performance. To this end, we first develop a flexible and efficient gait recognition codebase named OpenGait. Based on OpenGait, we deeply revisit the recent development of gait recognition by re-conducting the ablative experiments. Encouragingly, we find many hidden troubles of prior works and new insights for future research. Inspired by these discoveries, we develop a structurally simple, empirically powerful and practically robust baseline model, GaitBase. Experimentally, we comprehensively compare GaitBase with many current gait recognition methods on multiple public datasets, and the results reflect that GaitBase achieves significantly strong performance in most cases regardless of indoor or outdoor situations. The source code is available at \url{this https URL}.
https://arxiv.org/abs/2211.06597
Gait recognition is widely used in diversified practical applications. Currently, the most prevalent approach is to recognize human gait from RGB images, owing to the progress of computer vision technologies. Nevertheless, the perception capability of RGB cameras deteriorates in rough circumstances, and visual surveillance may cause privacy invasion. Due to the robustness and non-invasive feature of millimeter wave (mmWave) radar, radar-based gait recognition has attracted increasing attention in recent years. In this research, we propose a Hierarchical Dynamic Network (HDNet) for gait recognition using mmWave radar. In order to explore more dynamic information, we propose point flow as a novel point clouds descriptor. We also devise a dynamic frame sampling module to promote the efficiency of computation without deteriorating performance noticeably. To prove the superiority of our methods, we perform extensive experiments on two public mmWave radar-based gait recognition datasets, and the results demonstrate that our model is superior to existing state-of-the-art methods.
https://arxiv.org/abs/2211.00312
Most existing gait recognition methods are appearance-based, which rely on the silhouettes extracted from the video data of human walking activities. The less-investigated skeleton-based gait recognition methods directly learn the gait dynamics from 2D/3D human skeleton sequences, which are theoretically more robust solutions in the presence of appearance changes caused by clothes, hairstyles, and carrying objects. However, the performance of skeleton-based solutions is still largely behind the appearance-based ones. This paper aims to close such performance gap by proposing a novel network model, GaitMixer, to learn more discriminative gait representation from skeleton sequence data. In particular, GaitMixer follows a heterogeneous multi-axial mixer architecture, which exploits the spatial self-attention mixer followed by the temporal large-kernel convolution mixer to learn rich multi-frequency signals in the gait feature maps. Experiments on the widely used gait database, CASIA-B, demonstrate that GaitMixer outperforms the previous SOTA skeleton-based methods by a large margin while achieving a competitive performance compared with the representative appearance-based solutions. Code will be available at this https URL
https://arxiv.org/abs/2210.15491
As a unique biometric that can be perceived at a distance, gait has broad applications in person authentication, social security and so on. Existing gait recognition methods pay attention to extracting either spatial or spatiotemporal representations. However, they barely consider extracting diverse motion features, a fundamental characteristic in gaits, from gait sequences. In this paper, we propose a novel motion-aware spatiotemporal feature learning network for gait recognition, termed GaitMAST, which can unleash the potential of motion-aware features. In the shallow layer, specifically, we propose a dual-path frame-level feature extractor, in which one path extracts overall spatiotemporal features and the other extracts motion salient features by focusing on dynamic regions. In the deeper layers, we design a two-branch clip-level feature extractor, in which one focuses on fine-grained spatial information and the other on motion detail preservation. Consequently, our GaitMAST preserves the individual's unique walking patterns well, further enhancing the robustness of spatiotemporal features. Extensive experimental results on two commonly-used cross-view gait datasets demonstrate the superior performance of GaitMAST over existing state-of-the-art methods. On CASIA-B, our model achieves an average rank-1 accuracy of 94.1%. In particular, GaitMAST achieves rank-1 accuracies of 96.1% and 88.1% under the bag-carry and coat wearing conditions, respectively, outperforming the second best by a large margin and demonstrating its robustness against spatial variations.
https://arxiv.org/abs/2210.11817
While the Vision Transformer has been used in gait recognition, its application in multi-view gait recognition is still limited. Different views significantly affect the extraction and identification accuracy of the characteristics of gait contour. To address this, this paper proposes a Siamese Mobile Vision Transformer (SMViT). This model not only focuses on the local characteristics of the human gait space but also considers the characteristics of long-distance attention associations, which can extract multi-dimensional step status characteristics. In addition, it describes how different perspectives affect gait characteristics and generate reliable perspective feature relationship factors. The average recognition rate of SMViT on the CASIA B data set reached 96.4%. The experimental results show that SMViT can attain state-of-the-art performance compared to advanced step recognition models such as GaitGAN, Multi_view GAN, Posegait and other gait recognition models.
https://arxiv.org/abs/2210.10421