Gait recognition (GR) is a growing biometric modality used for person identification from a distance through visual cameras. GR provides a secure and reliable alternative to fingerprint and face recognition, as it is harder to distinguish between false and authentic signals. Furthermore, its resistance to spoofing makes GR suitable for all types of environments. With the rise of deep learning, steadily improving strides have been made in GR technology with promising results in various contexts. As video surveillance becomes more prevalent, new obstacles arise, such as ensuring uniform performance evaluation across different protocols, reliable recognition despite shifting lighting conditions, fluctuations in gait patterns, and protecting privacy.This survey aims to give an overview of GR and analyze the environmental elements and complications that could affect it in comparison to other biometric recognition systems. The primary goal is to examine the existing deep learning (DL) techniques employed for human GR that may generate new research opportunities.
步识别(GR)是一种正在增长的生物特征识别方式,通过视觉摄像机用于远距离人员身份识别。GR提供了指纹和面部识别的可靠和安全替代品,因为更难区分虚假和真实信号。此外,它的抗伪造能力使GR适用于各种环境。随着深度学习的兴起,GR技术稳步前进,在各种情况下取得了令人瞩目的成果。随着视频监控越来越普遍,出现了新的问题,例如确保不同协议下一致的性能评估、即使在不同照明条件下也能可靠识别、步态模式的不规则变化以及保护隐私。本调查旨在提供一个概述GR的情况,并分析与环境元素和复杂性相比可能对其产生影响的其他生物特征识别系统。其主要目标是审查现有的人类步识别(GR)技术,可能为新的研究机会提供支持。
https://arxiv.org/abs/2309.10144
Powered ankle prostheses effectively assist people with lower limb amputation to perform daily activities. High performance prostheses with adjustable compliance and capability to predict and implement amputee's intent are crucial for them to be comparable to or better than a real limb. However, current designs fail to provide simple yet effective compliance of the joint with full potential of modification, and lack accurate gait prediction method in real time. This paper proposes an innovative design of powered ankle prosthesis with serial elastic actuator (SEA), and puts forward a MLP based gait recognition method that can accurately and continuously predict more gait parameters for motion sensing and control. The prosthesis mimics biological joint with similar weight, torque, and power which can assist walking of up to 4 m/s. A new design of planar torsional spring is proposed for the SEA, which has better stiffness, endurance, and potential of modification than current designs. The gait recognition system simultaneously generates locomotive speed, gait phase, ankle angle and angular velocity only utilizing signals of single IMU, holding advantage in continuity, adaptability for speed range, accuracy, and capability of multi-functions.
动力脚踝假肢有效地协助人们完成腿部缺失的人的日常活动。具有可调节 compliance 和能够预测和实现缺失腿意图的性能出色的假肢是至关重要的,以便它们能够与真实的腿部相媲美或更好。然而,目前的设计未能提供具有完整潜力的关节 compliance 的简单而有效的 compliance,也没有实时准确的步态预测方法。本文提出了一种创新性的设计,即使用 serial elastic actuator (SEA) 动力脚踝假肢,并提出了基于 MLP 的步态识别方法,能够准确和连续地预测更多的步态参数,用于运动感知和控制。假肢模拟生物关节,具有相似的重量、扭矩和功率,可以协助步行速度达到 4 米/秒。为 SEA 提出了一种新的平面 torsional spring 设计,其 stiffness、耐久性和修改潜力都比目前的设计更好。步态识别系统同时生成运动速度、步态阶段、脚踝角度和角速度,仅利用单个惯性测量单元(IMU)的信号,具有连续、适应速度范围、准确性和多功能能力的优势。
https://arxiv.org/abs/2309.08323
We present FastPoseGait, an open-source toolbox for pose-based gait recognition based on PyTorch. Our toolbox supports a set of cutting-edge pose-based gait recognition algorithms and a variety of related benchmarks. Unlike other pose-based projects that focus on a single algorithm, FastPoseGait integrates several state-of-the-art (SOTA) algorithms under a unified framework, incorporating both the latest advancements and best practices to ease the comparison of effectiveness and efficiency. In addition, to promote future research on pose-based gait recognition, we provide numerous pre-trained models and detailed benchmark results, which offer valuable insights and serve as a reference for further investigations. By leveraging the highly modular structure and diverse methods offered by FastPoseGait, researchers can quickly delve into pose-based gait recognition and promote development in the field. In this paper, we outline various features of this toolbox, aiming that our toolbox and benchmarks can further foster collaboration, facilitate reproducibility, and encourage the development of innovative algorithms for pose-based gait recognition. FastPoseGait is available at this https URL and is actively maintained. We will continue updating this report as we add new features.
我们提出了 FastPoseGait,一个基于 PyTorch 的开源工具集,用于基于姿态的步态识别。该工具集支持一组最新的基于姿态的步态识别算法和相关基准。与专注于单一算法的其他基于姿态的项目不同,FastPoseGait在一个统一框架下集成了多个最先进的算法,包括最新的进展和最佳实践,以 ease 于效率和效力的比较。此外,为了促进基于姿态的步态识别研究的未来发展,我们提供了许多预训练模型和详细的基准结果,提供了有价值的 insights 并作为进一步研究的参考。通过利用 FastPoseGait 提供的高模块化结构和多种方法,研究人员可以快速深入基于姿态的步态识别,促进该领域的发展。在本文中,我们描述了该工具集的各种特性,旨在使我们的工具集和基准进一步促进合作、促进可重复性和鼓励开发基于姿态的步态识别的创新算法。FastPoseGait 可以在这个 https URL 上可用,并正在积极维护。我们将随着添加新特性继续更新本报告。
https://arxiv.org/abs/2309.00794
Binary silhouettes and keypoint-based skeletons have dominated human gait recognition studies for decades since they are easy to extract from video frames. Despite their success in gait recognition for in-the-lab environments, they usually fail in real-world scenarios due to their low information entropy for gait representations. To achieve accurate gait recognition in the wild, this paper presents a novel gait representation, named Gait Parsing Sequence (GPS). GPSs are sequences of fine-grained human segmentation, i.e., human parsing, extracted from video frames, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the GPS representation, we propose a novel human parsing-based gait recognition framework, named ParsingGait. ParsingGait contains a Convolutional Neural Network (CNN)-based backbone and two light-weighted heads. The first head extracts global semantic features from GPSs, while the other one learns mutual information of part-level features through Graph Convolutional Networks to model the detailed dynamics of human walking. Furthermore, due to the lack of suitable datasets, we build the first parsing-based dataset for gait recognition in the wild, named Gait3D-Parsing, by extending the large-scale and challenging Gait3D dataset. Based on Gait3D-Parsing, we comprehensively evaluate our method and existing gait recognition methods. The experimental results show a significant improvement in accuracy brought by the GPS representation and the superiority of ParsingGait. The code and dataset are available at this https URL .
二进制轮廓和关键点基于 skeleton 的骨骼结构已经主导了数十年的人步态识别研究,因为它们可以从视频帧中轻松提取。尽管在实验室环境下的人步态识别取得了成功,但在现实世界中通常失败,因为它们在步态表示方面的信息熵较低。为了实现野生状态下准确的步态识别,本文提出了一种新的步态表示方法,称为步态解析序列(GPS), GPS 是由精细的人类分割序列(即人类解析)提取的视频帧序列,因此它们具有更高的信息熵,以编码步行时精细人类部件的形状和动态。此外,为了更好地探索 GPS 表示的能力,我们提出了一种基于人类解析的步态识别框架,称为 ParsingGait。ParsingGait 包含一个卷积神经网络(CNN)基线和一个轻量级头,第一个头从 GPS 中提取全局语义特征,而另一个头通过学习部分级别的特征相互信息,通过图卷积网络模型模拟人类步行的详细动态。此外,由于缺少适当的数据集,我们建立了第一个基于解析的步态识别数据集,称为步态3D-解析,通过扩展大型且具有挑战性的步态3D数据集。基于步态3D-解析,我们全面地评估了我们的方法和现有的步态识别方法。实验结果显示,GPS 表示带来了显著的精度提高,以及 ParsingGait 的优越性。代码和数据集可在 this https URL 上获取。
https://arxiv.org/abs/2308.16739
Gait recognition is a promising biometric technology for identification due to its non-invasiveness and long-distance. However, external variations such as clothing changes and viewpoint differences pose significant challenges to gait recognition. Silhouette-based methods preserve body shape but neglect internal structure information, while skeleton-based methods preserve structure information but omit appearance. To fully exploit the complementary nature of the two modalities, a novel triple branch gait recognition framework, TriGait, is proposed in this paper. It effectively integrates features from the skeleton and silhouette data in a hybrid fusion manner, including a two-stream network to extract static and motion features from appearance, a simple yet effective module named JSA-TC to capture dependencies between all joints, and a third branch for cross-modal learning by aligning and fusing low-level features of two modalities. Experimental results demonstrate the superiority and effectiveness of TriGait for gait recognition. The proposed method achieves a mean rank-1 accuracy of 96.0% over all conditions on CASIA-B dataset and 94.3% accuracy for CL, significantly outperforming all the state-of-the-art methods. The source code will be available at this https URL.
步态识别是一种有前途的生物特征识别技术,因为它非侵入性和远距离。然而,外部变化,例如服装变化和视角差异,对步态识别提出了巨大的挑战。基于轮廓的方法维持身体形状,但忽略了内部结构信息,而基于骨骼的方法维持结构信息,但忽略了外观。为了充分利用两种模式学的特征互补性,在本文中提出了一种全新的三分支步态识别框架,名为TriGait。它通过混合融合方式有效地整合了骨骼和轮廓数据的特征,包括一个二流网络提取外观静态和运动特征,一个简单但有效的模块名为JSA-TC,用于捕捉所有关节之间的依赖关系,以及第三个分支,通过对齐和融合两种模式学的低层次特征,进行跨模态学习。实验结果证明了TriGait对步态识别的优越性和有效性。该方法在CASIA-B数据集上在所有条件上都实现了100%的平均准确率,而在CL数据集中实现了94.3%的准确率,显著超越了所有最先进的方法。源代码将在这个httpsURL上提供。
https://arxiv.org/abs/2308.13340
Gait recognition is to seek correct matches for query individuals by their unique walking patterns at a long distance. However, current methods focus solely on individual gait features, disregarding inter-personal relationships. In this paper, we reconsider gait representation, asserting that gait is not just an aggregation of individual features, but also the relationships among different subjects' gait features once reference gaits are established. From this perspective, we redefine classifier weights as reference-anchored gaits, allowing each person's gait to be described by their relationship with these references. In our work, we call this novel descriptor Relationship Descriptor (RD). This Relationship Descriptor offers two benefits: emphasizing meaningful features and enhancing robustness. To be specific, The normalized dot product between gait features and classifier weights signifies a similarity relation, where each dimension indicates the similarity between the test sample and each training ID's gait prototype, respectively. Despite its potential, the direct use of relationship descriptors poses dimensionality challenges since the dimension of RD depends on the training set's identity count. To address this, we propose a Farthest Anchored gaits Selection algorithm and a dimension reduction method to boost gait recognition performance. Our method can be built on top of off-the-shelf pre-trained classification-based models without extra parameters. We show that RD achieves higher recognition performance than directly using extracted features. We evaluate the effectiveness of our method on the popular GREW, Gait3D, CASIA-B, and OU-MVLP, showing that our method consistently outperforms the baselines and achieves state-of-the-art performances.
步态识别的目标是通过其独特的步态模式,在远距离上寻找与查询个体的正确匹配。然而,当前的方法主要关注个体步态特征,忽略了人际关系。在本文中,我们重新考虑步态表示,断言步态不仅是个体特征的聚合,而且不同个体步态特征之间的关系。从这个角度来看,我们重新定义了分类器权重为参考锚定的步态,允许每个人通过与他们参考之间的关系来描述他们的步态。在我们的工作中,我们称之为新描述符关系描述符(RD)。这种关系描述符提供了两个好处:强调有意义的特征并增强鲁棒性。具体来说,步态特征和分类器权重的等方差表示相似性关系,其中每个维度表示测试样本和每个训练ID的步态原型的相似性。尽管它的潜在性,直接使用关系描述符会面临维度挑战,因为RD维度取决于训练集的身份计数。为了解决这个问题,我们提出了一个参考锚定步态选择算法和维度减少方法,以增强步态识别性能。我们的方法可以建立在普通的预训练分类模型之上,而不需要额外的参数。我们表明,RD比直接使用提取的特征实现更高的识别性能。我们评估了流行的GREW、步态3D、中华亚洲B和OU-MVLP等流行的应用程序,表明我们的方法 consistently outperform 基准模型并实现了最先进的性能。
https://arxiv.org/abs/2308.11487
Psychological trait estimation from external factors such as movement and appearance is a challenging and long-standing problem in psychology, and is principally based on the psychological theory of embodiment. To date, attempts to tackle this problem have utilized private small-scale datasets with intrusive body-attached sensors. Potential applications of an automated system for psychological trait estimation include estimation of occupational fatigue and psychology, and marketing and advertisement. In this work, we propose PsyMo (Psychological traits from Motion), a novel, multi-purpose and multi-modal dataset for exploring psychological cues manifested in walking patterns. We gathered walking sequences from 312 subjects in 7 different walking variations and 6 camera angles. In conjunction with walking sequences, participants filled in 6 psychological questionnaires, totalling 17 psychometric attributes related to personality, self-esteem, fatigue, aggressiveness and mental health. We propose two evaluation protocols for psychological trait estimation. Alongside the estimation of self-reported psychological traits from gait, the dataset can be used as a drop-in replacement to benchmark methods for gait recognition. We anonymize all cues related to the identity of the subjects and publicly release only silhouettes, 2D / 3D human skeletons and 3D SMPL human meshes.
心理特征从外部因素如运动和外观进行评估是一个挑战性的长期问题,主要基于身体存在感的心理理论。迄今为止,尝试解决这个问题的方法使用了具有内置身体传感器的私人小型数据集。自动化系统的心理特征估计潜在应用包括衡量工作疲劳和心理学,以及市场营销和广告。在这个工作中,我们提出了 PsyMo (心理特征从运动),一个新颖、多功能和多模态的数据集,以探索步行模式中的心理迹象。我们从312名 subjects 收集了7种不同的步行变化和6个摄像头角度的步行序列。与步行序列一起,参与者填写了6个心理问卷,总共涉及17个心理属性,与个性、自尊心、疲劳、攻击性和心理健康相关。我们提出了两个评估协议,以心理特征估计为主。除了从步态中自我报告的心理特征估计外,该数据集还可以作为步态识别基准方法的备用方法。我们匿名化与 subjects 身份相关的所有线索,并仅公开发布轮廓、2D/3D人类骨骼模型和3D SMPL人类网格。
https://arxiv.org/abs/2308.10631
The analysis of patterns of walking is an important area of research that has numerous applications in security, healthcare, sports and human-computer interaction. Lately, walking patterns have been regarded as a unique fingerprinting method for automatic person identification at a distance. In this work, we propose a novel gait recognition architecture called Gait Pyramid Transformer (GaitPT) that leverages pose estimation skeletons to capture unique walking patterns, without relying on appearance information. GaitPT adopts a hierarchical transformer architecture that effectively extracts both spatial and temporal features of movement in an anatomically consistent manner, guided by the structure of the human skeleton. Our results show that GaitPT achieves state-of-the-art performance compared to other skeleton-based gait recognition works, in both controlled and in-the-wild scenarios. GaitPT obtains 82.6% average accuracy on CASIA-B, surpassing other works by a margin of 6%. Moreover, it obtains 52.16% Rank-1 accuracy on GREW, outperforming both skeleton-based and appearance-based approaches.
分析走路的模式是一个重要的研究领域,它在安全、医疗、体育和人机交互等领域有着广泛的应用。最近,走路的模式被视为一种独特的指纹识别方法,用于远程自动身份验证。在本研究中,我们提出了一种称为Gait Pyramid Transformer(GaitPT)的新步态识别架构,它利用姿态估计骨骼结构来捕获独特的走路模式,而不需要依赖外观信息。GaitPT采用Hierarchical Transformer架构,以结构一致的方式有效地提取运动的空间和时间特征,受人类骨骼结构的指导。我们的结果表明,GaitPT相比其他基于骨骼的步态识别工作在控制和野生场景下取得了最先进的性能。GaitPT在CASIA-B任务中的平均准确率为82.6%,比其他工作高出6%。此外,它在GREW任务中Rank-1的准确率为52.16%,超过了基于骨骼和外观的方法。
https://arxiv.org/abs/2308.10623
Deep learning research has made many biometric recognition solution viable, but it requires vast training data to achieve real-world generalization. Unlike other biometric traits, such as face and ear, gait samples cannot be easily crawled from the web to form massive unconstrained datasets. As the human body has been extensively studied for different digital applications, one can rely on prior shape knowledge to overcome data scarcity. This work follows the recent trend of fitting a 3D deformable body model into gait videos using deep neural networks to obtain disentangled shape and pose representations for each frame. To enforce temporal consistency in the network, we introduce a new Linear Dynamical Systems (LDS) module and loss based on Koopman operator theory, which provides an unsupervised motion regularization for the periodic nature of gait, as well as a predictive capacity for extending gait sequences. We compare LDS to the traditional adversarial training approach and use the USF HumanID and CASIA-B datasets to show that LDS can obtain better accuracy with less training data. Finally, we also show that our 3D modeling approach is much better than other 3D gait approaches in overcoming viewpoint variation under normal, bag-carrying and clothing change conditions.
深度学习研究已经使许多生物特征识别解决方案成为可能,但要实现现实世界的泛化,需要大量的训练数据。与面部和耳朵等生物特征不同,步态样本难以从网络上爬取,以形成大量没有限制的dataset。由于人类身体已经被广泛应用于各种数字应用中,可以依靠先前的形状知识来克服数据稀缺的问题。这项工作遵循了最近的趋势,使用深度神经网络将3D可编辑身体模型嵌入步态视频,以获得每个帧的分离形状和姿态表示。为了在网络中实现时间一致性,我们引入了新的线性动态系统(LDS)模块,并基于 Koopman 操作理论计算损失,该损失为步态的周期性性质提供了 unsupervised 的运动Regularization,并为提高步态序列预测能力提供了预测能力。我们比较了 LDS 与传统对抗训练方法,并使用 USF 人类ID 和 CASIA-B 数据集证明了 LDS 在更少训练数据的情况下能够获得更好的准确性。最后,我们还展示了我们的3D建模方法比其他3D步态方法在正常、背包携带和服装变化条件下克服视角变化方面更好。
https://arxiv.org/abs/2308.07468
Graph convolutional networks have been widely applied in skeleton-based gait recognition. A key challenge in this task is to distinguish the individual walking styles of different subjects across various views. Existing state-of-the-art methods employ uniform convolutions to extract features from diverse sequences and ignore the effects of viewpoint changes. To overcome these limitations, we propose a condition-adaptive graph (CAG) convolution network that can dynamically adapt to the specific attributes of each skeleton sequence and the corresponding view angle. In contrast to using fixed weights for all joints and sequences, we introduce a joint-specific filter learning (JSFL) module in the CAG method, which produces sequence-adaptive filters at the joint level. The adaptive filters capture fine-grained patterns that are unique to each joint, enabling the extraction of diverse spatial-temporal information about body parts. Additionally, we design a view-adaptive topology learning (VATL) module that generates adaptive graph topologies. These graph topologies are used to correlate the joints adaptively according to the specific view conditions. Thus, CAG can simultaneously adjust to various walking styles and viewpoints. Experiments on the two most widely used datasets (i.e., CASIA-B and OU-MVLP) show that CAG surpasses all previous skeleton-based methods. Moreover, the recognition performance can be enhanced by simply combining CAG with appearance-based methods, demonstrating the ability of CAG to provide useful complementary information.The source code will be available at this https URL.
Graph convolutional networks 已经广泛应用于基于骨骼的姿态识别任务中。在这个任务中,一个关键挑战是区分不同研究对象在不同视角下的个人步行风格。现有的先进技术方法使用统一的卷积来提取不同序列的特征并忽略视角变化的影响。为了克服这些限制,我们提出了一种条件自适应的Graph(CAG)卷积网络,它能够动态地适应每个骨骼序列特定的属性和对应的视角。与所有关节和序列都使用固定权重不同,我们在CAG方法中引入了关节特定滤波学习(JSFL)模块,该模块能够在关节级别产生适应的滤波器。自适应滤波器捕获每个关节独特的精细模式,使能够提取身体部位的不同时间和空间信息。此外,我们设计了一个适应视角的拓扑学习(VATL)模块,该模块生成适应的Graph拓扑。这些Graph拓扑用于根据特定视角条件相互关联的关节。因此,CAG可以同时适应多种步行风格和视角。对最常用的两个数据集(CASIA-B和OU-MVLP)进行的测试表明,CAG超越了所有先前基于骨骼的方法。此外,仅仅将CAG与基于外观的方法相结合可以提高识别性能,这表明CAG能够提供有用的补充信息。源代码将在本httpsURL上提供。
https://arxiv.org/abs/2308.06707
Gait recognition is one of the most promising video-based biometric technologies. The edge of silhouettes and motion are the most informative feature and previous studies have explored them separately and achieved notable results. However, due to occlusions and variations in viewing angles, their gait recognition performance is often affected by the predefined spatial segmentation strategy. Moreover, traditional temporal pooling usually neglects distinctive temporal information in gait. To address the aforementioned issues, we propose a novel gait recognition framework, denoted as GaitASMS, which can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information. The Adaptive Structured Representation Extraction Module (ASRE) separates the edge of silhouettes by using the adaptive edge mask and maximizes the representation in semantic latent space. Moreover, the Multi-Scale Temporal Aggregation Module (MSTA) achieves effective modeling of long-short-range temporal information by temporally aggregated structure. Furthermore, we propose a new data augmentation, denoted random mask, to enrich the sample space of long-term occlusion and enhance the generalization of the model. Extensive experiments conducted on two datasets demonstrate the competitive advantage of proposed method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset, GaitASMS achieves the average accuracy of 93.5\% and outperforms the baseline on rank-1 accuracies by 3.4\% and 6.3\%, respectively, in BG and CL. The ablation experiments demonstrate the effectiveness of ASRE and MSTA.
步识别是视频based生物特征技术中最具潜力的一种。轮廓的边缘和运动是最 informative 的特征,先前的研究已经单独探讨了它们并取得了显著的结果。然而,由于遮挡和视角的变化,它们的步识别性能往往受到预先定义的空间分割策略的影响。此外,传统的时间聚合通常忽视了步的特定时间信息。为了解决上述问题,我们提出了一种新的步识别框架,称为 GaitASMS,它能够有效提取适应结构的时空表示,并自然聚合多尺度的时间信息。自适应结构表示提取模块(ASRE)使用自适应边缘 mask 分离轮廓的边缘,并最大限度地扩展在语义潜在空间中的表示。此外,多尺度时间聚合模块(MSTA)通过时间聚合结构实现长短期时间信息的有效建模。我们还提出了一种新的数据增强方法,称为随机 mask,以丰富长期遮挡样本空间,并增强模型的泛化能力。在两个数据集上进行广泛的实验表明, proposed 方法的竞争优势,特别是在复杂场景下,即BG和CL。在CASIA-B数据集上,GaitASMS的平均精度为93.5%,在BG和CL的Rank-1精度方面分别比基准方法高出3.4%和6.3%。点消解实验表明ASRE和MSTA的有效性。
https://arxiv.org/abs/2307.15981
Gait, the manner of walking, has been proven to be a reliable biometric with uses in surveillance, marketing and security. A promising new direction for the field is training gait recognition systems without explicit human annotations, through self-supervised learning approaches. Such methods are heavily reliant on strong augmentations for the same walking sequence to induce more data variability and to simulate additional walking variations. Current data augmentation schemes are heuristic and cannot provide the necessary data variation as they are only able to provide simple temporal and spatial distortions. In this work, we propose GaitMorph, a novel method to modify the walking variation for an input gait sequence. Our method entails the training of a high-compression model for gait skeleton sequences that leverages unlabelled data to construct a discrete and interpretable latent space, which preserves identity-related features. Furthermore, we propose a method based on optimal transport theory to learn latent transport maps on the discrete codebook that morph gait sequences between variations. We perform extensive experiments and show that our method is suitable to synthesize additional views for an input sequence.
步态(Gait)是步行的方式,已被证明是一种可靠的生物特征,可用于监视、营销和安保等领域。一个有前途的新方向是通过自监督学习方法训练输入步态序列的步态识别系统。这些方法 heavily reliant on strong augmentations for the same walking sequence to induce more data variability and to simulate additional walking variations. 当前的数据增强方案是启发式的,无法提供必要的数据变化,因为它们只能提供简单的时间和空间扭曲。在本研究中,我们提出了步态变形方法(Gait Morph),一种修改输入步态序列的新方法。我们的方法涉及训练一个高压缩的步态骨骼序列模型,利用未标记数据建立离散且可解释的潜在空间,以维持身份相关特征。此外,我们提出了基于最优传输理论的方法,学习离散代码book中的过渡传输映射,以morph步态序列之间的变化。我们进行了广泛的实验,并表明我们的方法适合合成输入序列的额外视角。
https://arxiv.org/abs/2307.14713
Gait recognition holds the promise of robustly identifying subjects based on their walking patterns instead of color information. While previous approaches have performed well for curated indoor scenes, they have significantly impeded applicability in unconstrained situations, e.g. outdoor, long distance scenes. We propose an end-to-end GAit DEtection and Recognition (GADER) algorithm for human authentication in challenging outdoor scenarios. Specifically, GADER leverages a Double Helical Signature to detect the fragment of human movement and incorporates a novel gait recognition method, which learns representations by distilling from an auxiliary RGB recognition model. At inference time, GADER only uses the silhouette modality but benefits from a more robust representation. Extensive experiments on indoor and outdoor datasets demonstrate that the proposed method outperforms the State-of-The-Arts for gait recognition and verification, with a significant 20.6% improvement on unconstrained, long distance scenes.
步态识别的潜力是通过其行走模式而不是颜色信息,以 robustly 识别研究对象。尽管先前的方法在 curated 室内场景方面表现良好,但它们在无约束情况(如室外、远距离场景)的适用性方面却 significantly 限制了作用。我们提出一种 end-to-end GAit DEtection and Recognition (GADER) 算法,用于在挑战性的室外场景下对人类身份验证。具体来说,GADER 利用双曲签名来检测人类运动片段并纳入一种新的步态识别方法,该方法通过从辅助 RGB 识别模型中提取表示来学习表示。在推理时,GADER 仅使用轮廓模式,但得益于更稳健的表示。在室内和室外数据集上的广泛实验表明, proposed 方法在步态识别和验证方面优于最先进的方法,在无约束的远距离场景上实现了显著的 20.6% 改进。
https://arxiv.org/abs/2307.14578
Gait recognition aims to distinguish different walking patterns by analyzing video-level human silhouettes, rather than relying on appearance information. Previous research on gait recognition has primarily focused on extracting local or global spatial-temporal representations, while overlooking the intrinsic periodic features of gait sequences, which, when fully utilized, can significantly enhance performance. In this work, we propose a plug-and-play strategy, called Temporal Periodic Alignment (TPA), which leverages the periodic nature and fine-grained temporal dependencies of gait patterns. The TPA strategy comprises two key components. The first component is Adaptive Fourier-transform Position Encoding (AFPE), which adaptively converts features and discrete-time signals into embeddings that are sensitive to periodic walking patterns. The second component is the Temporal Aggregation Module (TAM), which separates embeddings into trend and seasonal components, and extracts meaningful temporal correlations to identify primary components, while filtering out random noise. We present a simple and effective baseline method for gait recognition, based on the TPA strategy. Extensive experiments conducted on three popular public datasets (CASIA-B, OU-MVLP, and GREW) demonstrate that our proposed method achieves state-of-the-art performance on multiple benchmark tests.
步态识别旨在通过分析视频级别人类轮廓,而不是依赖外观信息,区分不同的步态模式。以往的步态识别研究主要关注提取局部或全球的空间和时间表示,而忽视了步态序列的内在周期性特征,这些特征如果得到充分利用,可以显著提高性能。在本文中,我们提出了一种插件式策略,称为时间周期性匹配(TPA),利用步态模式的周期性特性和精细的时间依赖关系。TPA策略由两个关键组件组成。第一个组件是自适应傅里叶位置编码(AFPE),它自适应地将特征和离散时间信号转换为嵌入,这些嵌入对周期性步态模式敏感。第二个组件是时间聚合模块(TAM),它将嵌入分离为趋势和季节性组件,并提取有意义的时间相关度,以识别主要组件,同时过滤掉随机噪声。我们提出了一种基于TPA策略的简单有效基准方法,在三个流行的公共数据集(CASIA-B、OU-MVLP和GREW)上进行了大量实验,证明了我们提出的方法在多个基准测试中实现了最先进的性能。
https://arxiv.org/abs/2307.13259
Gait recognition is a biometric technique that identifies individuals by their unique walking styles, which is suitable for unconstrained environments and has a wide range of applications. While current methods focus on exploiting body part-based representations, they often neglect the hierarchical dependencies between local motion patterns. In this paper, we propose a hierarchical spatio-temporal representation learning (HSTL) framework for extracting gait features from coarse to fine. Our framework starts with a hierarchical clustering analysis to recover multi-level body structures from the whole body to local details. Next, an adaptive region-based motion extractor (ARME) is designed to learn region-independent motion features. The proposed HSTL then stacks multiple ARMEs in a top-down manner, with each ARME corresponding to a specific partition level of the hierarchy. An adaptive spatio-temporal pooling (ASTP) module is used to capture gait features at different levels of detail to perform hierarchical feature mapping. Finally, a frame-level temporal aggregation (FTA) module is employed to reduce redundant information in gait sequences through multi-scale temporal downsampling. Extensive experiments on CASIA-B, OUMVLP, GREW, and Gait3D datasets demonstrate that our method outperforms the state-of-the-art while maintaining a reasonable balance between model accuracy and complexity.
步态识别是一种生物特征技术,通过独特的步态风格识别个人,适用于没有限制的环境,并拥有广泛的应用程序。当前的方法主要关注利用身体部分表示,但往往忽略了 local 运动模式之间的层级依赖性。在本文中,我们提出了一种分层时间表示学习框架(HSTL),以从粗到细提取步态特征。我们的框架从分层聚类分析开始,以恢复整个身体的多层次身体结构,以及 local 细节的特定分区级别。接下来,我们设计了自适应区域运动提取器(ARME),以学习区域独立的运动特征。 proposed HSTL 然后将多个 ARME 按上式堆叠,每个 ARME 对应于层级的特定分区级别。自适应时间和空间汇聚(ASTP)模块被用于捕获不同层次的步态特征,以进行层级特征映射。最后,Frame 级别的时间聚合(FTA)模块被使用通过多尺度时间削减来减少步态序列中的冗余信息。在 CASIA-B、OUMVLP、GREW 和步态3D数据集上的广泛实验表明,我们的方法优于当前最先进的方法,同时保持模型精度和复杂性的合理平衡。
https://arxiv.org/abs/2307.09856
Gait Recognition is a computer vision task aiming to identify people by their walking patterns. Existing methods show impressive results on individual datasets but lack the ability to generalize to unseen scenarios. Unsupervised Domain Adaptation (UDA) tries to adapt a model, pre-trained in a supervised manner on a source domain, to an unlabelled target domain. UDA for Gait Recognition is still in its infancy and existing works proposed solutions to limited scenarios. In this paper, we reveal a fundamental phenomenon in adaptation of gait recognition models, in which the target domain is biased to pose-based features rather than identity features, causing a significant performance drop in the identification task. We suggest Gait Orientation-based method for Unsupervised Domain Adaptation (GOUDA) to reduce this bias. To this end, we present a novel Triplet Selection algorithm with a curriculum learning framework, aiming to adapt the embedding space by pushing away samples of similar poses and bringing closer samples of different poses. We provide extensive experiments on four widely-used gait datasets, CASIA-B, OU-MVLP, GREW, and Gait3D, and on three backbones, GaitSet, GaitPart, and GaitGL, showing the superiority of our proposed method over prior works.
步识别是计算机视觉任务,旨在通过步进模式识别人类。现有方法在个人数据集上表现出令人印象深刻的结果,但缺乏对未知场景的泛化能力。无监督领域适应(UDA)尝试适应在一个源领域中以监督方式训练过的模型,到未标记的目标领域。步识别领域的UDA仍然处于婴儿期,现有工作提出了针对有限场景的解决方案。在本文中,我们揭示了步识别模型适应中的基本概念现象,即目标领域受到姿态特征而不是身份特征的偏见,导致识别任务的性能大幅下降。我们建议步方向based方法(GOUDA)以降低这种偏见。为此,我们提出了一种新的三选一算法,结合课程学习框架,旨在通过推开相似姿态样本并使不同姿态样本更接近来适应嵌入空间。我们提供了广泛的实验,对四种广泛使用的步识别数据集、CASIA-B、OU-MVLP、GREW和Gait3D,以及对三个支柱、GaitSet、GaitPart和GaitGL,展示了我们提出的方法相对于先前工作的优势。
https://arxiv.org/abs/2307.06751
Gait recognition aims at identifying the pedestrians at a long distance by their biometric gait patterns. It is inherently challenging due to the various covariates and the properties of silhouettes (textureless and colorless), which result in two kinds of pair-wise hard samples: the same pedestrian could have distinct silhouettes (intra-class diversity) and different pedestrians could have similar silhouettes (inter-class similarity). In this work, we propose to solve the hard sample issue with a Memory-augmented Progressive Learning network (GaitMPL), including Dynamic Reweighting Progressive Learning module (DRPL) and Global Structure-Aligned Memory bank (GSAM). Specifically, DRPL reduces the learning difficulty of hard samples by easy-to-hard progressive learning. GSAM further augments DRPL with a structure-aligned memory mechanism, which maintains and models the feature distribution of each ID. Experiments on two commonly used datasets, CASIA-B and OU-MVLP, demonstrate the effectiveness of GaitMPL. On CASIA-B, we achieve the state-of-the-art performance, i.e., 88.0% on the most challenging condition (Clothing) and 93.3% on the average condition, which outperforms the other methods by at least 3.8% and 1.4%, respectively.
步行识别旨在通过其生物特征步行模式长距离识别行人。由于生物特征和各种影响特征的元数据(无纹理和无色)的存在,步行识别本身就具有挑战性。这导致两类相对困难的样本:同一个人可能具有不同的轮廓(内部类别多样性)和不同人可能具有相似的轮廓(外部类别相似性)。在本文中,我们提出了使用记忆增强的渐进学习网络(GaitMPL)来解决困难样本问题。该网络包括动态重加权渐进学习模块(DRPL)和全球结构对齐记忆库(GSAM)。具体来说,DRPL通过渐进学习易到难的方式降低了困难样本的学习难度。GSAM进一步通过结构对齐记忆机制增强了DRPL,以保持和模型每个ID的特征分布。在两个常用的数据集CASIA-B和OU-MVLP上进行了实验,证明了GaitMPL的有效性。在CASIA-B上,我们达到了最先进的性能,即挑战性条件下的88.0%,平均条件下的93.3%,比其他任何方法都高出至少3.8%和1.4%。
https://arxiv.org/abs/2306.04650
Gait recognition, which aims at identifying individuals by their walking patterns, has recently drawn increasing research attention. However, gait recognition still suffers from the conflicts between the limited binary visual clues of the silhouette and numerous covariates with diverse scales, which brings challenges to the model's adaptiveness. In this paper, we address this conflict by developing a novel MetaGait that learns to learn an omni sample adaptive representation. Towards this goal, MetaGait injects meta-knowledge, which could guide the model to perceive sample-specific properties, into the calibration network of the attention mechanism to improve the adaptiveness from the omni-scale, omni-dimension, and omni-process perspectives. Specifically, we leverage the meta-knowledge across the entire process, where Meta Triple Attention and Meta Temporal Pooling are presented respectively to adaptively capture omni-scale dependency from spatial/channel/temporal dimensions simultaneously and to adaptively aggregate temporal information through integrating the merits of three complementary temporal aggregation methods. Extensive experiments demonstrate the state-of-the-art performance of the proposed MetaGait. On CASIA-B, we achieve rank-1 accuracy of 98.7%, 96.0%, and 89.3% under three conditions, respectively. On OU-MVLP, we achieve rank-1 accuracy of 92.4%.
步态识别旨在通过其行走模式识别个人,但该方法仍面临着轮廓有限二进制视觉线索和众多不同尺度的协变量之间的冲突,这给模型的适应性带来了挑战。在本文中,我们旨在解决这一冲突,开发了一种新形式的 MetaGait,该模型学习学习多样本自适应表示。为实现这一目标,MetaGait 在注意力机制的校准网络中注入meta-知识,以指导模型感知样本特定的性质,以提高从多尺度、多维度和多进程的视角的适应性。具体来说,我们在整个过程中利用meta-知识,其中 Meta tripleAttention 和 Meta Temporal Pooling分别用于自适应地捕捉多尺度依赖从空间/通道/时间维度同时捕捉,并自适应地聚合时间信息,通过整合三个互补的时间聚合方法的优点。广泛实验展示了所提出的 MetaGait 的最新性能。在 CASIA-B 测试中,我们分别在不同的条件下实现了排名1的准确性98.7%、96.0%和89.3%。在 OU-MVLP 测试中,我们实现了排名1的准确性92.4%。
https://arxiv.org/abs/2306.03445
Gait is one of the most promising biometrics that aims to identify pedestrians from their walking patterns. However, prevailing methods are susceptible to confounders, resulting in the networks hardly focusing on the regions that reflect effective walking patterns. To address this fundamental problem in gait recognition, we propose a Generative Counterfactual Intervention framework, dubbed GaitGCI, consisting of Counterfactual Intervention Learning (CIL) and Diversity-Constrained Dynamic Convolution (DCDC). CIL eliminates the impacts of confounders by maximizing the likelihood difference between factual/counterfactual attention while DCDC adaptively generates sample-wise factual/counterfactual attention to efficiently perceive the sample-wise properties. With matrix decomposition and diversity constraint, DCDC guarantees the model to be efficient and effective. Extensive experiments indicate that proposed GaitGCI: 1) could effectively focus on the discriminative and interpretable regions that reflect gait pattern; 2) is model-agnostic and could be plugged into existing models to improve performance with nearly no extra cost; 3) efficiently achieves state-of-the-art performance on arbitrary scenarios (in-the-lab and in-the-wild).
步态识别是最有前途的生物学特征之一,旨在从步行模式中识别行人。然而,当前的方法容易受到混淆的影响,导致网络很难关注反映有效步行模式的区域。为了解决步态识别中的 fundamental problem,我们提出了一种生成反对义词干预框架,称为 GaitGCI,由反对义词干预学习(CIL)和多样性限制的动态聚合(DCDC)组成。CIL 通过最大化事实/反对义词注意之间的 likelihood 差异来消除混淆的影响,而 DCDC 自适应地生成样本wise 事实/反对义词注意,以高效地感知样本wise 特性。通过矩阵分解和多样性限制,DCDC 保证模型高效且有效。广泛的实验表明,提出的 GaitGCI 可以:1)有效地关注反映步态模式的可辨别和可解释区域;2)具有模型无关性,可以与现有的模型集成来提高性能,几乎不需要额外的成本;3)高效地实现任意场景(实验室和野生)的先进技术表现。
https://arxiv.org/abs/2306.03428
Gait recognition is an emerging biological recognition technology that identifies and verifies individuals based on their walking patterns. However, many current methods are limited in their use of temporal information. In order to fully harness the potential of gait recognition, it is crucial to consider temporal features at various granularities and spans. Hence, in this paper, we propose a novel framework named GaitGS, which aggregates temporal features in the granularity dimension and span dimension simultaneously. Specifically, Multi-Granularity Feature Extractor (MGFE) is proposed to focus on capturing the micro-motion and macro-motion information at the frame level and unit level respectively. Moreover, we present Multi-Span Feature Learning (MSFL) module to generate global and local temporal representations. On three popular gait datasets, extensive experiments demonstrate the state-of-the-art performance of our method. Our method achieves the Rank-1 accuracies of 92.9% (+0.5%), 52.0% (+1.4%), and 97.5% (+0.8%) on CASIA-B, GREW, and OU-MVLP respectively. The source code will be released soon.
步态识别是一项新兴的生物识别技术,基于个人步态特征来识别和验证个体。然而,许多当前的方法在使用时间信息方面的限制很大。为了充分利用步态识别的潜力,必须考虑不同粒度和跨度的时间特征。因此,在本文中,我们提出了一种名为 GaitGS 的新框架,该框架同时聚合粒度和跨度方面的时间特征。具体来说,我们建议采用 Multi-Granularity 特征提取器(MGFE),专注于捕捉帧级和单元级上的微动和 macro-Motion 信息。此外,我们还提出了 Multi-Span 特征学习模块,以生成全球和 local 时间表示。在三个流行的步态数据集上,广泛实验展示了我们方法的先进性能。我们的方法在 CASIA-B、GREW 和 OU-MVLP 等数据集上分别实现了 92.9%、52.0% 和 97.5%的排名第一精度。源代码将很快发布。
https://arxiv.org/abs/2305.19700