Gait recognition holds the promise to robustly identify subjects based on walking patterns instead of appearance information. In recent years, this field has been dominated by learning methods based on two principal input representations: dense silhouette masks or sparse pose keypoints. In this work, we propose a novel, point-based Contour-Pose representation, which compactly expresses both body shape and body parts information. We further propose a local-to-global architecture, called GaitContour, to leverage this novel representation and efficiently compute subject embedding in two stages. The first stage consists of a local transformer that extracts features from five different body regions. The second stage then aggregates the regional features to estimate a global human gait representation. Such a design significantly reduces the complexity of the attention operation and improves efficiency and performance simultaneously. Through large scale experiments, GaitContour is shown to perform significantly better than previous point-based methods, while also being significantly more efficient than silhouette-based methods. On challenging datasets with significant distractors, GaitContour can even outperform silhouette-based methods.
基于行走模式的识别具有识别基于步行模式而非外观信息的受试者的潜力。在过去的几年里,这个领域主导着基于两个主要输入表示的学习方法:密集轮廓掩码或稀疏姿态关键点。在这项工作中,我们提出了一个新颖的基于点的轮廓-姿态表示,该表示既包含了身体形状信息,又包含了身体部位信息。我们进一步提出了一个局部到全局架构,称为GaitContour,以利用这个新颖表示并高效地计算受试者嵌入。第一阶段包括从五个不同的身体区域提取特征的局部Transformer。第二阶段然后对区域特征进行聚合,以估计全局人类步态表示。这种设计显著减少了注意操作的复杂性,同时提高了效率和性能。通过大型的实验,GaitContour证明了比之前基于点的方法显著更好的性能,同时比基于轮廓的方法更高效。在具有巨大干扰者的具有挑战性的数据集中,GaitContour甚至超过了基于轮廓的方法。
https://arxiv.org/abs/2311.16497
The choice of the representations is essential for deep gait recognition methods. The binary silhouettes and skeletal coordinates are two dominant representations in recent literature, achieving remarkable advances in many scenarios. However, inherent challenges remain, in which silhouettes are not always guaranteed in unconstrained scenes, and structural cues have not been fully utilized from skeletons. In this paper, we introduce a novel skeletal gait representation named Skeleton Map, together with SkeletonGait, a skeleton-based method to exploit structural information from human skeleton maps. Specifically, the skeleton map represents the coordinates of human joints as a heatmap with Gaussian approximation, exhibiting a silhouette-like image devoid of exact body structure. Beyond achieving state-of-the-art performances over five popular gait datasets, more importantly, SkeletonGait uncovers novel insights about how important structural features are in describing gait and when do they play a role. Furthermore, we propose a multi-branch architecture, named SkeletonGait++, to make use of complementary features from both skeletons and silhouettes. Experiments indicate that SkeletonGait++ outperforms existing state-of-the-art methods by a significant margin in various scenarios. For instance, it achieves an impressive rank-1 accuracy of over $85\%$ on the challenging GREW dataset. All the source code will be available at this https URL.
选择表示方式对深度步态识别方法至关重要。二值轮廓和骨架坐标是最近文献中两种主导表示方式,在许多场景中取得了显著的进步。然而,仍然存在一些固有挑战,其中轮廓并不总是保证在约束场景中,从骨架中也没有完全利用到结构信息。在本文中,我们提出了一个新颖的骨架步态表示名为骨架映射,以及一个基于骨架的人体骨架图挖掘方法SkeletonGait。具体来说,骨架映射用高斯近似的二维图像表示人类关节的坐标,呈现出类似轮廓的图像,缺乏精确的身体结构。除了在五个流行的步态数据集上实现最先进的性能,更重要的是,SkeletonGait揭示了描述步态和何时结构特征发挥作用的新见解。此外,我们提出了一个多分支架构,名为SkeletonGait++,以利用骨架和轮廓的互补特征。实验表明,SkeletonGait++在各种场景中都显著优于现有最先进的方法。例如,它在具有挑战性的GREW数据集上实现了令人印象深刻的排名前1%的准确性。所有源代码都将公开发布在这个https URL上。
https://arxiv.org/abs/2311.13444
Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, we present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition. HiH features a main branch that utilizes Hierarchical Gait Decomposer (HGD) modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data. This approach captures motion hierarchies from overall body dynamics to detailed limb movements, facilitating the representation of gait attributes across multiple spatial resolutions. Complementing this, an auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis. It employs a Deformable Spatial Enhancement (DSE) module for pose-guided spatial attention and a Deformable Temporal Alignment (DTA) module for aligning motion dynamics through learned temporal offsets. Extensive evaluations across diverse indoor and outdoor datasets demonstrate HiH's state-of-the-art performance, affirming a well-balanced trade-off between accuracy and efficiency.
翻译: 平衡设置中,平衡计取得了进展,但在无约束的环境中,由于诸如视野变化、遮挡和不同行走速度等问题的存在,它 significantly 挣扎。此外,由于跨模态不兼容,将多个模式融合的努力通常面临有限的改进,特别是在户外场景中。为解决这些问题,我们提出了一个多模态层次结构层次网络(HiH),该网络整合了轮廓和姿态序列以实现稳健的步态识别。HiH 具有主分支和辅助分支。主分支利用分层步态分解器(HGD)模块对轮廓数据进行深度和内部模块层次检查,以捕捉整体身体动态到详细肢体运动的运动层次结构。这种方法从总体身体动态到详细肢体运动捕捉运动层次结构,从而在多个空间分辨率上表示步态属性。补充的是,辅助分支基于二维关节序列,丰富了步态分析的时空方面。它采用了一个可塑的空间增强(DSE)模块进行姿态引导的空间关注,和一个可塑的时间对齐(DTA)模块通过学习到的时间偏移来对运动动态进行对齐。在多样室内和室外数据集上进行广泛的评估证明HiH 实现了最先进的性能,确实实现了准确性和效率之间的良好平衡。
https://arxiv.org/abs/2311.11210
Human silhouette extraction is a fundamental task in computer vision with applications in various downstream tasks. However, occlusions pose a significant challenge, leading to incomplete and distorted silhouettes. To address this challenge, we introduce POISE: Pose Guided Human Silhouette Extraction under Occlusions, a novel self-supervised fusion framework that enhances accuracy and robustness in human silhouette prediction. By combining initial silhouette estimates from a segmentation model with human joint predictions from a 2D pose estimation model, POISE leverages the complementary strengths of both approaches, effectively integrating precise body shape information and spatial information to tackle occlusions. Furthermore, the self-supervised nature of \POISE eliminates the need for costly annotations, making it scalable and practical. Extensive experimental results demonstrate its superiority in improving silhouette extraction under occlusions, with promising results in downstream tasks such as gait recognition. The code for our method is available this https URL.
人体轮廓提取是计算机视觉中一个基本任务,在各种下游任务中有应用。然而,遮挡带来的挑战相当大,导致轮廓不完整和扭曲。为解决这个问题,我们引入了POISE:在遮挡下的人体轮廓提取,一种新颖的自监督融合框架,可以提高人体轮廓预测的准确性和鲁棒性。通过将来自分割模型的初始轮廓估计与来自2D姿势估计模型的人体关节预测相结合,POISE有效地利用了两种方法的互补优势,将精确的身体形状信息和空间信息结合起来解决遮挡问题。此外,自监督的 nature of POISE 消除了需要昂贵注释的需求,使得它具有可扩展性和实用性。大量的实验结果表明,在遮挡条件下,POISE在改善轮廓提取方面具有优越性,同时在下游任务(如步态识别)中取得了有益的结果。我们的方法的代码可以从https://URL中获取。
https://arxiv.org/abs/2311.05077
Gait analysis leverages unique walking patterns for person identification and assessment across multiple domains. Among the methods used for gait analysis, skeleton-based approaches have shown promise due to their robust and interpretable features. However, these methods often rely on hand-crafted spatial-temporal graphs that are based on human anatomy disregarding the particularities of the dataset and task. This paper proposes a novel method to simplify the spatial-temporal graph representation for gait-based gender estimation, improving interpretability without losing performance. Our approach employs two models, an upstream and a downstream model, that can adjust the adjacency matrix for each walking instance, thereby removing the fixed nature of the graph. By employing the Straight-Through Gumbel-Softmax trick, our model is trainable end-to-end. We demonstrate the effectiveness of our approach on the CASIA-B dataset for gait-based gender estimation. The resulting graphs are interpretable and differ qualitatively from fixed graphs used in existing models. Our research contributes to enhancing the explainability and task-specific adaptability of gait recognition, promoting more efficient and reliable gait-based biometrics.
步态分析利用独特的步态模式跨多个领域进行人员识别和评估。在步态分析中使用的方法中,基于骨骼的方法表现出了很大的潜力,因为它们具有稳健和可解释的特点。然而,这些方法通常依赖于手工构建的空间和时间图形,这些图形是基于人类解剖学而不考虑数据集和任务特定性的。本文提出了一种新方法,以简化基于步态性别估计的空间和时间图形表示,同时提高解释性,而不会丢失性能。我们的方法和两个模型一起工作,一个向前,一个向后,可以调整每个行走实例的相邻矩阵,从而消除图形的固定性质。通过使用直穿 Gumbel-Softmax技巧,我们的模型可以 end-to-end 训练。我们使用 CASIA-B 数据集展示了我们方法的有效性, resulting 图形具有可解释性,与当前模型使用的固定图形 qualitative 不同。我们的研究有助于增强解释性和任务特定适应性的步态识别,促进更高效和可靠的基于步态的生物特征。
https://arxiv.org/abs/2310.03396
Most current gait recognition methods suffer from poor interpretability and high computational cost. To improve interpretability, we investigate gait features in the embedding space based on Koopman operator theory. The transition matrix in this space captures complex kinematic features of gait cycles, namely the Koopman operator. The diagonal elements of the operator matrix can represent the overall motion trend, providing a physically meaningful descriptor. To reduce the computational cost of our algorithm, we use a reversible autoencoder to reduce the model size and eliminate convolutional layers to compress its depth, resulting in fewer floating-point operations. Experimental results on multiple datasets show that our method reduces computational cost to 1% compared to state-of-the-art methods while achieving competitive recognition accuracy 98% on non-occlusion datasets.
目前的步进识别方法通常存在 poor interpretability 和 high computational cost 的问题,为了改善 interpretability,我们基于 Koopman 操作理论研究了步进特征在嵌入空间中的表示。在这个空间中,过渡矩阵捕获了步进周期中的复杂运动特征,即 Koopman 操作。操作矩阵的对角元素可以表示整个运动趋势,提供了具有物理意义的描述符。为了降低算法的计算成本,我们使用可逆自编码器减少模型大小,消除卷积层以压缩深度,从而减少了浮点操作。多个数据集的实验结果显示,与我们最先进的方法相比,我们的算法将计算成本降低到 1% 以下,而在包含遮挡数据集上的竞争性识别准确率达到 98%。
https://arxiv.org/abs/2309.14764
Gait recognition (GR) is a growing biometric modality used for person identification from a distance through visual cameras. GR provides a secure and reliable alternative to fingerprint and face recognition, as it is harder to distinguish between false and authentic signals. Furthermore, its resistance to spoofing makes GR suitable for all types of environments. With the rise of deep learning, steadily improving strides have been made in GR technology with promising results in various contexts. As video surveillance becomes more prevalent, new obstacles arise, such as ensuring uniform performance evaluation across different protocols, reliable recognition despite shifting lighting conditions, fluctuations in gait patterns, and protecting privacy.This survey aims to give an overview of GR and analyze the environmental elements and complications that could affect it in comparison to other biometric recognition systems. The primary goal is to examine the existing deep learning (DL) techniques employed for human GR that may generate new research opportunities.
步识别(GR)是一种正在增长的生物特征识别方式,通过视觉摄像机用于远距离人员身份识别。GR提供了指纹和面部识别的可靠和安全替代品,因为更难区分虚假和真实信号。此外,它的抗伪造能力使GR适用于各种环境。随着深度学习的兴起,GR技术稳步前进,在各种情况下取得了令人瞩目的成果。随着视频监控越来越普遍,出现了新的问题,例如确保不同协议下一致的性能评估、即使在不同照明条件下也能可靠识别、步态模式的不规则变化以及保护隐私。本调查旨在提供一个概述GR的情况,并分析与环境元素和复杂性相比可能对其产生影响的其他生物特征识别系统。其主要目标是审查现有的人类步识别(GR)技术,可能为新的研究机会提供支持。
https://arxiv.org/abs/2309.10144
Powered ankle prostheses effectively assist people with lower limb amputation to perform daily activities. High performance prostheses with adjustable compliance and capability to predict and implement amputee's intent are crucial for them to be comparable to or better than a real limb. However, current designs fail to provide simple yet effective compliance of the joint with full potential of modification, and lack accurate gait prediction method in real time. This paper proposes an innovative design of powered ankle prosthesis with serial elastic actuator (SEA), and puts forward a MLP based gait recognition method that can accurately and continuously predict more gait parameters for motion sensing and control. The prosthesis mimics biological joint with similar weight, torque, and power which can assist walking of up to 4 m/s. A new design of planar torsional spring is proposed for the SEA, which has better stiffness, endurance, and potential of modification than current designs. The gait recognition system simultaneously generates locomotive speed, gait phase, ankle angle and angular velocity only utilizing signals of single IMU, holding advantage in continuity, adaptability for speed range, accuracy, and capability of multi-functions.
动力脚踝假肢有效地协助人们完成腿部缺失的人的日常活动。具有可调节 compliance 和能够预测和实现缺失腿意图的性能出色的假肢是至关重要的,以便它们能够与真实的腿部相媲美或更好。然而,目前的设计未能提供具有完整潜力的关节 compliance 的简单而有效的 compliance,也没有实时准确的步态预测方法。本文提出了一种创新性的设计,即使用 serial elastic actuator (SEA) 动力脚踝假肢,并提出了基于 MLP 的步态识别方法,能够准确和连续地预测更多的步态参数,用于运动感知和控制。假肢模拟生物关节,具有相似的重量、扭矩和功率,可以协助步行速度达到 4 米/秒。为 SEA 提出了一种新的平面 torsional spring 设计,其 stiffness、耐久性和修改潜力都比目前的设计更好。步态识别系统同时生成运动速度、步态阶段、脚踝角度和角速度,仅利用单个惯性测量单元(IMU)的信号,具有连续、适应速度范围、准确性和多功能能力的优势。
https://arxiv.org/abs/2309.08323
We present FastPoseGait, an open-source toolbox for pose-based gait recognition based on PyTorch. Our toolbox supports a set of cutting-edge pose-based gait recognition algorithms and a variety of related benchmarks. Unlike other pose-based projects that focus on a single algorithm, FastPoseGait integrates several state-of-the-art (SOTA) algorithms under a unified framework, incorporating both the latest advancements and best practices to ease the comparison of effectiveness and efficiency. In addition, to promote future research on pose-based gait recognition, we provide numerous pre-trained models and detailed benchmark results, which offer valuable insights and serve as a reference for further investigations. By leveraging the highly modular structure and diverse methods offered by FastPoseGait, researchers can quickly delve into pose-based gait recognition and promote development in the field. In this paper, we outline various features of this toolbox, aiming that our toolbox and benchmarks can further foster collaboration, facilitate reproducibility, and encourage the development of innovative algorithms for pose-based gait recognition. FastPoseGait is available at this https URL and is actively maintained. We will continue updating this report as we add new features.
我们提出了 FastPoseGait,一个基于 PyTorch 的开源工具集,用于基于姿态的步态识别。该工具集支持一组最新的基于姿态的步态识别算法和相关基准。与专注于单一算法的其他基于姿态的项目不同,FastPoseGait在一个统一框架下集成了多个最先进的算法,包括最新的进展和最佳实践,以 ease 于效率和效力的比较。此外,为了促进基于姿态的步态识别研究的未来发展,我们提供了许多预训练模型和详细的基准结果,提供了有价值的 insights 并作为进一步研究的参考。通过利用 FastPoseGait 提供的高模块化结构和多种方法,研究人员可以快速深入基于姿态的步态识别,促进该领域的发展。在本文中,我们描述了该工具集的各种特性,旨在使我们的工具集和基准进一步促进合作、促进可重复性和鼓励开发基于姿态的步态识别的创新算法。FastPoseGait 可以在这个 https URL 上可用,并正在积极维护。我们将随着添加新特性继续更新本报告。
https://arxiv.org/abs/2309.00794
Binary silhouettes and keypoint-based skeletons have dominated human gait recognition studies for decades since they are easy to extract from video frames. Despite their success in gait recognition for in-the-lab environments, they usually fail in real-world scenarios due to their low information entropy for gait representations. To achieve accurate gait recognition in the wild, this paper presents a novel gait representation, named Gait Parsing Sequence (GPS). GPSs are sequences of fine-grained human segmentation, i.e., human parsing, extracted from video frames, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the GPS representation, we propose a novel human parsing-based gait recognition framework, named ParsingGait. ParsingGait contains a Convolutional Neural Network (CNN)-based backbone and two light-weighted heads. The first head extracts global semantic features from GPSs, while the other one learns mutual information of part-level features through Graph Convolutional Networks to model the detailed dynamics of human walking. Furthermore, due to the lack of suitable datasets, we build the first parsing-based dataset for gait recognition in the wild, named Gait3D-Parsing, by extending the large-scale and challenging Gait3D dataset. Based on Gait3D-Parsing, we comprehensively evaluate our method and existing gait recognition methods. The experimental results show a significant improvement in accuracy brought by the GPS representation and the superiority of ParsingGait. The code and dataset are available at this https URL .
二进制轮廓和关键点基于 skeleton 的骨骼结构已经主导了数十年的人步态识别研究,因为它们可以从视频帧中轻松提取。尽管在实验室环境下的人步态识别取得了成功,但在现实世界中通常失败,因为它们在步态表示方面的信息熵较低。为了实现野生状态下准确的步态识别,本文提出了一种新的步态表示方法,称为步态解析序列(GPS), GPS 是由精细的人类分割序列(即人类解析)提取的视频帧序列,因此它们具有更高的信息熵,以编码步行时精细人类部件的形状和动态。此外,为了更好地探索 GPS 表示的能力,我们提出了一种基于人类解析的步态识别框架,称为 ParsingGait。ParsingGait 包含一个卷积神经网络(CNN)基线和一个轻量级头,第一个头从 GPS 中提取全局语义特征,而另一个头通过学习部分级别的特征相互信息,通过图卷积网络模型模拟人类步行的详细动态。此外,由于缺少适当的数据集,我们建立了第一个基于解析的步态识别数据集,称为步态3D-解析,通过扩展大型且具有挑战性的步态3D数据集。基于步态3D-解析,我们全面地评估了我们的方法和现有的步态识别方法。实验结果显示,GPS 表示带来了显著的精度提高,以及 ParsingGait 的优越性。代码和数据集可在 this https URL 上获取。
https://arxiv.org/abs/2308.16739
Gait recognition is a promising biometric technology for identification due to its non-invasiveness and long-distance. However, external variations such as clothing changes and viewpoint differences pose significant challenges to gait recognition. Silhouette-based methods preserve body shape but neglect internal structure information, while skeleton-based methods preserve structure information but omit appearance. To fully exploit the complementary nature of the two modalities, a novel triple branch gait recognition framework, TriGait, is proposed in this paper. It effectively integrates features from the skeleton and silhouette data in a hybrid fusion manner, including a two-stream network to extract static and motion features from appearance, a simple yet effective module named JSA-TC to capture dependencies between all joints, and a third branch for cross-modal learning by aligning and fusing low-level features of two modalities. Experimental results demonstrate the superiority and effectiveness of TriGait for gait recognition. The proposed method achieves a mean rank-1 accuracy of 96.0% over all conditions on CASIA-B dataset and 94.3% accuracy for CL, significantly outperforming all the state-of-the-art methods. The source code will be available at this https URL.
步态识别是一种有前途的生物特征识别技术,因为它非侵入性和远距离。然而,外部变化,例如服装变化和视角差异,对步态识别提出了巨大的挑战。基于轮廓的方法维持身体形状,但忽略了内部结构信息,而基于骨骼的方法维持结构信息,但忽略了外观。为了充分利用两种模式学的特征互补性,在本文中提出了一种全新的三分支步态识别框架,名为TriGait。它通过混合融合方式有效地整合了骨骼和轮廓数据的特征,包括一个二流网络提取外观静态和运动特征,一个简单但有效的模块名为JSA-TC,用于捕捉所有关节之间的依赖关系,以及第三个分支,通过对齐和融合两种模式学的低层次特征,进行跨模态学习。实验结果证明了TriGait对步态识别的优越性和有效性。该方法在CASIA-B数据集上在所有条件上都实现了100%的平均准确率,而在CL数据集中实现了94.3%的准确率,显著超越了所有最先进的方法。源代码将在这个httpsURL上提供。
https://arxiv.org/abs/2308.13340
Gait recognition is to seek correct matches for query individuals by their unique walking patterns at a long distance. However, current methods focus solely on individual gait features, disregarding inter-personal relationships. In this paper, we reconsider gait representation, asserting that gait is not just an aggregation of individual features, but also the relationships among different subjects' gait features once reference gaits are established. From this perspective, we redefine classifier weights as reference-anchored gaits, allowing each person's gait to be described by their relationship with these references. In our work, we call this novel descriptor Relationship Descriptor (RD). This Relationship Descriptor offers two benefits: emphasizing meaningful features and enhancing robustness. To be specific, The normalized dot product between gait features and classifier weights signifies a similarity relation, where each dimension indicates the similarity between the test sample and each training ID's gait prototype, respectively. Despite its potential, the direct use of relationship descriptors poses dimensionality challenges since the dimension of RD depends on the training set's identity count. To address this, we propose a Farthest Anchored gaits Selection algorithm and a dimension reduction method to boost gait recognition performance. Our method can be built on top of off-the-shelf pre-trained classification-based models without extra parameters. We show that RD achieves higher recognition performance than directly using extracted features. We evaluate the effectiveness of our method on the popular GREW, Gait3D, CASIA-B, and OU-MVLP, showing that our method consistently outperforms the baselines and achieves state-of-the-art performances.
步态识别的目标是通过其独特的步态模式,在远距离上寻找与查询个体的正确匹配。然而,当前的方法主要关注个体步态特征,忽略了人际关系。在本文中,我们重新考虑步态表示,断言步态不仅是个体特征的聚合,而且不同个体步态特征之间的关系。从这个角度来看,我们重新定义了分类器权重为参考锚定的步态,允许每个人通过与他们参考之间的关系来描述他们的步态。在我们的工作中,我们称之为新描述符关系描述符(RD)。这种关系描述符提供了两个好处:强调有意义的特征并增强鲁棒性。具体来说,步态特征和分类器权重的等方差表示相似性关系,其中每个维度表示测试样本和每个训练ID的步态原型的相似性。尽管它的潜在性,直接使用关系描述符会面临维度挑战,因为RD维度取决于训练集的身份计数。为了解决这个问题,我们提出了一个参考锚定步态选择算法和维度减少方法,以增强步态识别性能。我们的方法可以建立在普通的预训练分类模型之上,而不需要额外的参数。我们表明,RD比直接使用提取的特征实现更高的识别性能。我们评估了流行的GREW、步态3D、中华亚洲B和OU-MVLP等流行的应用程序,表明我们的方法 consistently outperform 基准模型并实现了最先进的性能。
https://arxiv.org/abs/2308.11487
Psychological trait estimation from external factors such as movement and appearance is a challenging and long-standing problem in psychology, and is principally based on the psychological theory of embodiment. To date, attempts to tackle this problem have utilized private small-scale datasets with intrusive body-attached sensors. Potential applications of an automated system for psychological trait estimation include estimation of occupational fatigue and psychology, and marketing and advertisement. In this work, we propose PsyMo (Psychological traits from Motion), a novel, multi-purpose and multi-modal dataset for exploring psychological cues manifested in walking patterns. We gathered walking sequences from 312 subjects in 7 different walking variations and 6 camera angles. In conjunction with walking sequences, participants filled in 6 psychological questionnaires, totalling 17 psychometric attributes related to personality, self-esteem, fatigue, aggressiveness and mental health. We propose two evaluation protocols for psychological trait estimation. Alongside the estimation of self-reported psychological traits from gait, the dataset can be used as a drop-in replacement to benchmark methods for gait recognition. We anonymize all cues related to the identity of the subjects and publicly release only silhouettes, 2D / 3D human skeletons and 3D SMPL human meshes.
心理特征从外部因素如运动和外观进行评估是一个挑战性的长期问题,主要基于身体存在感的心理理论。迄今为止,尝试解决这个问题的方法使用了具有内置身体传感器的私人小型数据集。自动化系统的心理特征估计潜在应用包括衡量工作疲劳和心理学,以及市场营销和广告。在这个工作中,我们提出了 PsyMo (心理特征从运动),一个新颖、多功能和多模态的数据集,以探索步行模式中的心理迹象。我们从312名 subjects 收集了7种不同的步行变化和6个摄像头角度的步行序列。与步行序列一起,参与者填写了6个心理问卷,总共涉及17个心理属性,与个性、自尊心、疲劳、攻击性和心理健康相关。我们提出了两个评估协议,以心理特征估计为主。除了从步态中自我报告的心理特征估计外,该数据集还可以作为步态识别基准方法的备用方法。我们匿名化与 subjects 身份相关的所有线索,并仅公开发布轮廓、2D/3D人类骨骼模型和3D SMPL人类网格。
https://arxiv.org/abs/2308.10631
The analysis of patterns of walking is an important area of research that has numerous applications in security, healthcare, sports and human-computer interaction. Lately, walking patterns have been regarded as a unique fingerprinting method for automatic person identification at a distance. In this work, we propose a novel gait recognition architecture called Gait Pyramid Transformer (GaitPT) that leverages pose estimation skeletons to capture unique walking patterns, without relying on appearance information. GaitPT adopts a hierarchical transformer architecture that effectively extracts both spatial and temporal features of movement in an anatomically consistent manner, guided by the structure of the human skeleton. Our results show that GaitPT achieves state-of-the-art performance compared to other skeleton-based gait recognition works, in both controlled and in-the-wild scenarios. GaitPT obtains 82.6% average accuracy on CASIA-B, surpassing other works by a margin of 6%. Moreover, it obtains 52.16% Rank-1 accuracy on GREW, outperforming both skeleton-based and appearance-based approaches.
分析走路的模式是一个重要的研究领域,它在安全、医疗、体育和人机交互等领域有着广泛的应用。最近,走路的模式被视为一种独特的指纹识别方法,用于远程自动身份验证。在本研究中,我们提出了一种称为Gait Pyramid Transformer(GaitPT)的新步态识别架构,它利用姿态估计骨骼结构来捕获独特的走路模式,而不需要依赖外观信息。GaitPT采用Hierarchical Transformer架构,以结构一致的方式有效地提取运动的空间和时间特征,受人类骨骼结构的指导。我们的结果表明,GaitPT相比其他基于骨骼的步态识别工作在控制和野生场景下取得了最先进的性能。GaitPT在CASIA-B任务中的平均准确率为82.6%,比其他工作高出6%。此外,它在GREW任务中Rank-1的准确率为52.16%,超过了基于骨骼和外观的方法。
https://arxiv.org/abs/2308.10623
Deep learning research has made many biometric recognition solution viable, but it requires vast training data to achieve real-world generalization. Unlike other biometric traits, such as face and ear, gait samples cannot be easily crawled from the web to form massive unconstrained datasets. As the human body has been extensively studied for different digital applications, one can rely on prior shape knowledge to overcome data scarcity. This work follows the recent trend of fitting a 3D deformable body model into gait videos using deep neural networks to obtain disentangled shape and pose representations for each frame. To enforce temporal consistency in the network, we introduce a new Linear Dynamical Systems (LDS) module and loss based on Koopman operator theory, which provides an unsupervised motion regularization for the periodic nature of gait, as well as a predictive capacity for extending gait sequences. We compare LDS to the traditional adversarial training approach and use the USF HumanID and CASIA-B datasets to show that LDS can obtain better accuracy with less training data. Finally, we also show that our 3D modeling approach is much better than other 3D gait approaches in overcoming viewpoint variation under normal, bag-carrying and clothing change conditions.
深度学习研究已经使许多生物特征识别解决方案成为可能,但要实现现实世界的泛化,需要大量的训练数据。与面部和耳朵等生物特征不同,步态样本难以从网络上爬取,以形成大量没有限制的dataset。由于人类身体已经被广泛应用于各种数字应用中,可以依靠先前的形状知识来克服数据稀缺的问题。这项工作遵循了最近的趋势,使用深度神经网络将3D可编辑身体模型嵌入步态视频,以获得每个帧的分离形状和姿态表示。为了在网络中实现时间一致性,我们引入了新的线性动态系统(LDS)模块,并基于 Koopman 操作理论计算损失,该损失为步态的周期性性质提供了 unsupervised 的运动Regularization,并为提高步态序列预测能力提供了预测能力。我们比较了 LDS 与传统对抗训练方法,并使用 USF 人类ID 和 CASIA-B 数据集证明了 LDS 在更少训练数据的情况下能够获得更好的准确性。最后,我们还展示了我们的3D建模方法比其他3D步态方法在正常、背包携带和服装变化条件下克服视角变化方面更好。
https://arxiv.org/abs/2308.07468
Graph convolutional networks have been widely applied in skeleton-based gait recognition. A key challenge in this task is to distinguish the individual walking styles of different subjects across various views. Existing state-of-the-art methods employ uniform convolutions to extract features from diverse sequences and ignore the effects of viewpoint changes. To overcome these limitations, we propose a condition-adaptive graph (CAG) convolution network that can dynamically adapt to the specific attributes of each skeleton sequence and the corresponding view angle. In contrast to using fixed weights for all joints and sequences, we introduce a joint-specific filter learning (JSFL) module in the CAG method, which produces sequence-adaptive filters at the joint level. The adaptive filters capture fine-grained patterns that are unique to each joint, enabling the extraction of diverse spatial-temporal information about body parts. Additionally, we design a view-adaptive topology learning (VATL) module that generates adaptive graph topologies. These graph topologies are used to correlate the joints adaptively according to the specific view conditions. Thus, CAG can simultaneously adjust to various walking styles and viewpoints. Experiments on the two most widely used datasets (i.e., CASIA-B and OU-MVLP) show that CAG surpasses all previous skeleton-based methods. Moreover, the recognition performance can be enhanced by simply combining CAG with appearance-based methods, demonstrating the ability of CAG to provide useful complementary information.The source code will be available at this https URL.
Graph convolutional networks 已经广泛应用于基于骨骼的姿态识别任务中。在这个任务中,一个关键挑战是区分不同研究对象在不同视角下的个人步行风格。现有的先进技术方法使用统一的卷积来提取不同序列的特征并忽略视角变化的影响。为了克服这些限制,我们提出了一种条件自适应的Graph(CAG)卷积网络,它能够动态地适应每个骨骼序列特定的属性和对应的视角。与所有关节和序列都使用固定权重不同,我们在CAG方法中引入了关节特定滤波学习(JSFL)模块,该模块能够在关节级别产生适应的滤波器。自适应滤波器捕获每个关节独特的精细模式,使能够提取身体部位的不同时间和空间信息。此外,我们设计了一个适应视角的拓扑学习(VATL)模块,该模块生成适应的Graph拓扑。这些Graph拓扑用于根据特定视角条件相互关联的关节。因此,CAG可以同时适应多种步行风格和视角。对最常用的两个数据集(CASIA-B和OU-MVLP)进行的测试表明,CAG超越了所有先前基于骨骼的方法。此外,仅仅将CAG与基于外观的方法相结合可以提高识别性能,这表明CAG能够提供有用的补充信息。源代码将在本httpsURL上提供。
https://arxiv.org/abs/2308.06707
Gait recognition is one of the most promising video-based biometric technologies. The edge of silhouettes and motion are the most informative feature and previous studies have explored them separately and achieved notable results. However, due to occlusions and variations in viewing angles, their gait recognition performance is often affected by the predefined spatial segmentation strategy. Moreover, traditional temporal pooling usually neglects distinctive temporal information in gait. To address the aforementioned issues, we propose a novel gait recognition framework, denoted as GaitASMS, which can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information. The Adaptive Structured Representation Extraction Module (ASRE) separates the edge of silhouettes by using the adaptive edge mask and maximizes the representation in semantic latent space. Moreover, the Multi-Scale Temporal Aggregation Module (MSTA) achieves effective modeling of long-short-range temporal information by temporally aggregated structure. Furthermore, we propose a new data augmentation, denoted random mask, to enrich the sample space of long-term occlusion and enhance the generalization of the model. Extensive experiments conducted on two datasets demonstrate the competitive advantage of proposed method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset, GaitASMS achieves the average accuracy of 93.5\% and outperforms the baseline on rank-1 accuracies by 3.4\% and 6.3\%, respectively, in BG and CL. The ablation experiments demonstrate the effectiveness of ASRE and MSTA.
步识别是视频based生物特征技术中最具潜力的一种。轮廓的边缘和运动是最 informative 的特征,先前的研究已经单独探讨了它们并取得了显著的结果。然而,由于遮挡和视角的变化,它们的步识别性能往往受到预先定义的空间分割策略的影响。此外,传统的时间聚合通常忽视了步的特定时间信息。为了解决上述问题,我们提出了一种新的步识别框架,称为 GaitASMS,它能够有效提取适应结构的时空表示,并自然聚合多尺度的时间信息。自适应结构表示提取模块(ASRE)使用自适应边缘 mask 分离轮廓的边缘,并最大限度地扩展在语义潜在空间中的表示。此外,多尺度时间聚合模块(MSTA)通过时间聚合结构实现长短期时间信息的有效建模。我们还提出了一种新的数据增强方法,称为随机 mask,以丰富长期遮挡样本空间,并增强模型的泛化能力。在两个数据集上进行广泛的实验表明, proposed 方法的竞争优势,特别是在复杂场景下,即BG和CL。在CASIA-B数据集上,GaitASMS的平均精度为93.5%,在BG和CL的Rank-1精度方面分别比基准方法高出3.4%和6.3%。点消解实验表明ASRE和MSTA的有效性。
https://arxiv.org/abs/2307.15981
Gait, the manner of walking, has been proven to be a reliable biometric with uses in surveillance, marketing and security. A promising new direction for the field is training gait recognition systems without explicit human annotations, through self-supervised learning approaches. Such methods are heavily reliant on strong augmentations for the same walking sequence to induce more data variability and to simulate additional walking variations. Current data augmentation schemes are heuristic and cannot provide the necessary data variation as they are only able to provide simple temporal and spatial distortions. In this work, we propose GaitMorph, a novel method to modify the walking variation for an input gait sequence. Our method entails the training of a high-compression model for gait skeleton sequences that leverages unlabelled data to construct a discrete and interpretable latent space, which preserves identity-related features. Furthermore, we propose a method based on optimal transport theory to learn latent transport maps on the discrete codebook that morph gait sequences between variations. We perform extensive experiments and show that our method is suitable to synthesize additional views for an input sequence.
步态(Gait)是步行的方式,已被证明是一种可靠的生物特征,可用于监视、营销和安保等领域。一个有前途的新方向是通过自监督学习方法训练输入步态序列的步态识别系统。这些方法 heavily reliant on strong augmentations for the same walking sequence to induce more data variability and to simulate additional walking variations. 当前的数据增强方案是启发式的,无法提供必要的数据变化,因为它们只能提供简单的时间和空间扭曲。在本研究中,我们提出了步态变形方法(Gait Morph),一种修改输入步态序列的新方法。我们的方法涉及训练一个高压缩的步态骨骼序列模型,利用未标记数据建立离散且可解释的潜在空间,以维持身份相关特征。此外,我们提出了基于最优传输理论的方法,学习离散代码book中的过渡传输映射,以morph步态序列之间的变化。我们进行了广泛的实验,并表明我们的方法适合合成输入序列的额外视角。
https://arxiv.org/abs/2307.14713
Gait recognition holds the promise of robustly identifying subjects based on their walking patterns instead of color information. While previous approaches have performed well for curated indoor scenes, they have significantly impeded applicability in unconstrained situations, e.g. outdoor, long distance scenes. We propose an end-to-end GAit DEtection and Recognition (GADER) algorithm for human authentication in challenging outdoor scenarios. Specifically, GADER leverages a Double Helical Signature to detect the fragment of human movement and incorporates a novel gait recognition method, which learns representations by distilling from an auxiliary RGB recognition model. At inference time, GADER only uses the silhouette modality but benefits from a more robust representation. Extensive experiments on indoor and outdoor datasets demonstrate that the proposed method outperforms the State-of-The-Arts for gait recognition and verification, with a significant 20.6% improvement on unconstrained, long distance scenes.
步态识别的潜力是通过其行走模式而不是颜色信息,以 robustly 识别研究对象。尽管先前的方法在 curated 室内场景方面表现良好,但它们在无约束情况(如室外、远距离场景)的适用性方面却 significantly 限制了作用。我们提出一种 end-to-end GAit DEtection and Recognition (GADER) 算法,用于在挑战性的室外场景下对人类身份验证。具体来说,GADER 利用双曲签名来检测人类运动片段并纳入一种新的步态识别方法,该方法通过从辅助 RGB 识别模型中提取表示来学习表示。在推理时,GADER 仅使用轮廓模式,但得益于更稳健的表示。在室内和室外数据集上的广泛实验表明, proposed 方法在步态识别和验证方面优于最先进的方法,在无约束的远距离场景上实现了显著的 20.6% 改进。
https://arxiv.org/abs/2307.14578
Gait recognition aims to distinguish different walking patterns by analyzing video-level human silhouettes, rather than relying on appearance information. Previous research on gait recognition has primarily focused on extracting local or global spatial-temporal representations, while overlooking the intrinsic periodic features of gait sequences, which, when fully utilized, can significantly enhance performance. In this work, we propose a plug-and-play strategy, called Temporal Periodic Alignment (TPA), which leverages the periodic nature and fine-grained temporal dependencies of gait patterns. The TPA strategy comprises two key components. The first component is Adaptive Fourier-transform Position Encoding (AFPE), which adaptively converts features and discrete-time signals into embeddings that are sensitive to periodic walking patterns. The second component is the Temporal Aggregation Module (TAM), which separates embeddings into trend and seasonal components, and extracts meaningful temporal correlations to identify primary components, while filtering out random noise. We present a simple and effective baseline method for gait recognition, based on the TPA strategy. Extensive experiments conducted on three popular public datasets (CASIA-B, OU-MVLP, and GREW) demonstrate that our proposed method achieves state-of-the-art performance on multiple benchmark tests.
步态识别旨在通过分析视频级别人类轮廓,而不是依赖外观信息,区分不同的步态模式。以往的步态识别研究主要关注提取局部或全球的空间和时间表示,而忽视了步态序列的内在周期性特征,这些特征如果得到充分利用,可以显著提高性能。在本文中,我们提出了一种插件式策略,称为时间周期性匹配(TPA),利用步态模式的周期性特性和精细的时间依赖关系。TPA策略由两个关键组件组成。第一个组件是自适应傅里叶位置编码(AFPE),它自适应地将特征和离散时间信号转换为嵌入,这些嵌入对周期性步态模式敏感。第二个组件是时间聚合模块(TAM),它将嵌入分离为趋势和季节性组件,并提取有意义的时间相关度,以识别主要组件,同时过滤掉随机噪声。我们提出了一种基于TPA策略的简单有效基准方法,在三个流行的公共数据集(CASIA-B、OU-MVLP和GREW)上进行了大量实验,证明了我们提出的方法在多个基准测试中实现了最先进的性能。
https://arxiv.org/abs/2307.13259