Surveillance footage represents a valuable resource and opportunities for conducting gait analysis. However, the typical low quality and high noise levels in such footage can severely impact the accuracy of pose estimation algorithms, which are foundational for reliable gait analysis. Existing literature suggests a direct correlation between the efficacy of pose estimation and the subsequent gait analysis results. A common mitigation strategy involves fine-tuning pose estimation models on noisy data to improve robustness. However, this approach may degrade the downstream model's performance on the original high-quality data, leading to a trade-off that is undesirable in practice. We propose a processing pipeline that incorporates a task-targeted artifact correction model specifically designed to pre-process and enhance surveillance footage before pose estimation. Our artifact correction model is optimized to work alongside a state-of-the-art pose estimation network, HRNet, without requiring repeated fine-tuning of the pose estimation model. Furthermore, we propose a simple and robust method for obtaining low quality videos that are annotated with poses in an automatic manner with the purpose of training the artifact correction model. We systematically evaluate the performance of our artifact correction model against a range of noisy surveillance data and demonstrate that our approach not only achieves improved pose estimation on low-quality surveillance footage, but also preserves the integrity of the pose estimation on high resolution footage. Our experiments show a clear enhancement in gait analysis performance, supporting the viability of the proposed method as a superior alternative to direct fine-tuning strategies. Our contributions pave the way for more reliable gait analysis using surveillance data in real-world applications, regardless of data quality.
监视视频资料是一种宝贵的资源和进行姿态分析的机会。然而,这类视频的低质量和高噪声水平可能会严重影响姿态估计算法的准确性,这些算法是可靠姿态分析的基础。现有文献表明,姿态估计的有效性与后续的姿态分析结果之间存在直接关系。一种常见的缓解策略是在噪声数据上对姿态估计模型进行微调,以提高稳健性。然而,这种方法可能会在原始高质量数据上降低下游模型的性能,导致在实践中不必要的权衡。我们提出了一个处理流程,其中包含一个专门针对任务目标进行预处理和增强的监视视频处理模型。我们的预处理和增强模型与最先进的姿态估计网络——HRNet——协同工作,无需反复微调姿态估计模型。此外,我们提出了一种简单而鲁棒的方法,用于自动标注带有姿态的低质量视频,以训练预处理和增强模型。我们系统地评估了我们的预处理模型的性能,并证明我们的方法不仅能在低质量监视视频上实现 improved pose estimation,还能在高质量视频上保留姿态估计的完整性。我们的实验显示,我们的预处理模型在姿态分析性能上明显增强,支持了所提出的利用监视数据进行更可靠姿态分析作为直接微调策略的替代品。我们的贡献为使用监视数据进行更可靠姿态分析在现实应用中铺平道路,而无需考虑数据质量。
https://arxiv.org/abs/2404.12183
Gait is a behavioral biometric modality that can be used to recognize individuals by the way they walk from a far distance. Most existing gait recognition approaches rely on either silhouettes or skeletons, while their joint use is underexplored. Features from silhouettes and skeletons can provide complementary information for more robust recognition against appearance changes or pose estimation errors. To exploit the benefits of both silhouette and skeleton features, we propose a new gait recognition network, referred to as the GaitPoint+. Our approach models skeleton key points as a 3D point cloud, and employs a computational complexity-conscious 3D point processing approach to extract skeleton features, which are then combined with silhouette features for improved accuracy. Since silhouette- or CNN-based methods already require considerable amount of computational resources, it is preferable that the key point learning module is faster and more lightweight. We present a detailed analysis of the utilization of every human key point after the use of traditional max-pooling, and show that while elbow and ankle points are used most commonly, many useful points are discarded by max-pooling. Thus, we present a method to recycle some of the discarded points by a Recycling Max-Pooling module, during processing of skeleton point clouds, and achieve further performance improvement. We provide a comprehensive set of experimental results showing that (i) incorporating skeleton features obtained by a point-based 3D point cloud processing approach boosts the performance of three different state-of-the-art silhouette- and CNN-based baselines; (ii) recycling the discarded points increases the accuracy further. Ablation studies are also provided to show the effectiveness and contribution of different components of our approach.
步伐是一种行为生物测量方法,可以通过观察一个人从远处走来的方式来识别个体。目前的大多数步伐识别方法依赖于轮廓图或骨骼图,而它们之间的联合应用没有被充分利用。轮廓图和骨骼图的特征可以提供互补信息,以应对外貌变化或姿势估计错误。为了充分利用轮廓图和骨骼图的优势,我们提出了一个新的步伐识别网络,称为GaitPoint+。我们的方法将骨骼关键点建模为3D点云,并采用一种计算复杂性友好的3D点处理方法来提取骨骼特征,然后将这些特征与轮廓图特征相结合以提高准确性。由于轮廓图或CNN方法已经需要相当多的计算资源,因此更快的关键点学习模块和更轻量级的骨架网络更受欢迎。我们对使用传统最大池化方法后每个人体关键点的利用率进行了深入分析,并发现,尽管肘部和足踝关键点最常用,但许多有用的关键点却被最大池化丢弃了。因此,我们提出了一种通过回收被丢弃的关键点来提高骨架点云处理过程性能的方法,并在处理骨架点云的过程中实现进一步的性能提升。我们提供了全面的一组实验结果,表明:(i)通过基于点的方法对3D点云处理技术获得的骨骼特征可以提高三种最先进的轮廓图和CNN基站的性能;(ii)回收被丢弃的关键点可以进一步提高准确性。我们还提供了消融研究,以显示我们方法的不同组件的有效性和贡献。
https://arxiv.org/abs/2404.10213
Current gait recognition research mainly focuses on identifying pedestrians captured by the same type of sensor, neglecting the fact that individuals may be captured by different sensors in order to adapt to various environments. A more practical approach should involve cross-modality matching across different sensors. Hence, this paper focuses on investigating the problem of cross-modality gait recognition, with the objective of accurately identifying pedestrians across diverse vision sensors. We present CrossGait inspired by the feature alignment strategy, capable of cross retrieving diverse data modalities. Specifically, we investigate the cross-modality recognition task by initially extracting features within each modality and subsequently aligning these features across modalities. To further enhance the cross-modality performance, we propose a Prototypical Modality-shared Attention Module that learns modality-shared features from two modality-specific features. Additionally, we design a Cross-modality Feature Adapter that transforms the learned modality-specific features into a unified feature space. Extensive experiments conducted on the SUSTech1K dataset demonstrate the effectiveness of CrossGait: (1) it exhibits promising cross-modality ability in retrieving pedestrians across various modalities from different sensors in diverse scenes, and (2) CrossGait not only learns modality-shared features for cross-modality gait recognition but also maintains modality-specific features for single-modality recognition.
当前的步态识别研究主要集中在识别由相同类型传感器捕捉的行人,忽视了个人可能被不同类型的传感器捕捉的事实,以适应各种环境。更实际的方法应该涉及不同传感器之间的跨模态匹配。因此,本文重点研究了跨模态步行识别问题,以准确识别不同视觉传感器捕捉的行人。我们提出了CrossGait,这是一种基于特征对齐策略的步行识别方法,具有跨检索不同数据模态的能力。具体来说,我们研究了跨模态识别任务,首先在每种模态中提取特征,然后在这些特征之间进行对齐。为了进一步提高跨模态性能,我们提出了一个原型模态共享注意模块,从两个模态特定的特征中学习模态共享特征。此外,我们还设计了一个Cross-模态特征适配器,将学习到的模态特定特征转换为统一特征空间。在SUSTech1K数据集上进行的大量实验证明CrossGait的有效性:(1)它表现出在不同场景的多样传感器中检索行人具有希望的跨模态能力;(2)CrossGait不仅学习跨模态步行识别的模态共享特征,还保留单模态识别的模态特定特征。
https://arxiv.org/abs/2404.04120
Gait recognition aims to identify a person based on their walking sequences, serving as a useful biometric modality as it can be observed from long distances without requiring cooperation from the subject. In representing a person's walking sequence, silhouettes and skeletons are the two primary modalities used. Silhouette sequences lack detailed part information when overlapping occurs between different body segments and are affected by carried objects and clothing. Skeletons, comprising joints and bones connecting the joints, provide more accurate part information for different segments; however, they are sensitive to occlusions and low-quality images, causing inconsistencies in frame-wise results within a sequence. In this paper, we explore the use of a two-stream representation of skeletons for gait recognition, alongside silhouettes. By fusing the combined data of silhouettes and skeletons, we refine the two-stream skeletons, joints, and bones through self-correction in graph convolution, along with cross-modal correction with temporal consistency from silhouettes. We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.
翻译: 步态识别的目的是根据一个人的行走序列来识别这个人,作为一个有用的生物测量指标,因为它可以从很远的距离上通过合作观察到,而不需要对被测者进行合作。在表示一个人的行走序列时,轮廓和骨架是两种主要的模式。轮廓序列在多个身体部位之间发生重叠时缺乏详细的部分信息,并受到携带物品和服装的影响。骨架,由连接关节的关节和骨头组成,提供不同部位更准确的部分信息;然而,它们对遮挡和低质量图像敏感,导致序列中的帧结果不一致。在本文中,我们探讨了使用骨架的双流表示方法进行步态识别,同时使用轮廓。通过将轮廓和骨架的合并数据进行融合,我们通过自校正的图卷积和对时一致的跨模态校正在轮廓和骨架上进行优化。我们证明了,通过优化骨架,步态识别模型的性能可以在没有额外注释的公共步态识别数据集上实现比最先进方法更进一步的改进。
https://arxiv.org/abs/2404.02345
In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin with the observation that Relative Position Encoding (RPE) is a good way to bring affine transform generalization to ViTs. RPE, however, can only inject the model with prior knowledge that nearby pixels are more important than far pixels. Keypoint RPE (KP-RPE) is an extension of this principle, where the significance of pixels is not solely dictated by their proximity but also by their relative positions to specific keypoints within the image. By anchoring the significance of pixels around keypoints, the model can more effectively retain spatial relationships, even when those relationships are disrupted by affine transformations. We show the merit of KP-RPE in face and gait recognition. The experimental results demonstrate the effectiveness in improving face recognition performance from low-quality images, particularly where alignment is prone to failure. Code and pre-trained models are available.
在本文中,我们解决了使ViT模型对未见到的平移变换更加鲁棒的问题。这种鲁棒性在各种识别任务中变得有用,例如在面部识别中,当图像对齐失败时。我们提出了一种名为KP-RPE的新方法,它利用关键点(例如~面部关键点)使ViT更加弹性,对平移和姿态变化具有鲁棒性。我们首先观察到,相对位置编码(RPE)是将平移变换推广到ViT的好的方法。然而,RPE只能将模型注入到附近像素比远距离像素更重要这一先验知识。关键点RPE(KP-RPE)是这一原则的扩展,其中像素的重要性不仅由其邻近性决定,还由其与图像中特定关键点之间的相对位置决定。通过将像素的重要性锚定在关键点上,模型可以在平移变换破坏时更有效地保留空间关系。我们在面部和步态识别中展示了KP-RPE的优点。实验结果表明,从低质量图像中提高面部识别性能的有效方法,特别是在对齐容易失败的地方。代码和预训练模型可获得。
https://arxiv.org/abs/2403.14852
Gait recognition stands as one of the most pivotal remote identification technologies and progressively expands across research and industrial communities. However, existing gait recognition methods heavily rely on task-specific upstream driven by supervised learning to provide explicit gait representations, which inevitably introduce expensive annotation costs and potentially cause cumulative errors. Escaping from this trend, this work explores effective gait representations based on the all-purpose knowledge produced by task-agnostic Large Vision Models (LVMs) and proposes a simple yet efficient gait framework, termed BigGait. Specifically, the Gait Representation Extractor (GRE) in BigGait effectively transforms all-purpose knowledge into implicit gait features in an unsupervised manner, drawing from design principles of established gait representation construction approaches. Experimental results on CCPG, CAISA-B* and SUSTech1K indicate that BigGait significantly outperforms the previous methods in both self-domain and cross-domain tasks in most cases, and provides a more practical paradigm for learning the next-generation gait representation. Eventually, we delve into prospective challenges and promising directions in LVMs-based gait recognition, aiming to inspire future work in this emerging topic. The source code will be available at this https URL.
翻译: Gait识别是远程识别技术中最关键的一个,并随着研究和工业社区的不断发展而扩展。然而,现有的gait识别方法在很大程度上依赖于任务特定的上游驱动的监督学习来提供明确的gait表示,这无疑会带来昂贵的注释成本,并可能导致累积错误。为了摆脱这一趋势,本文探讨了基于任务无关的大型视觉模型(LVMs)产生的全知论的有效gait表示,并提出了一个简单而有效的gait框架,称为BigGait。具体来说,BigGait中的Gait表示提取器(GRE)以非监督方式将全知论转换为隐含的gait特征,并从现有gait表示构建方法的设计原则中汲取了设计原则。在CCPG、CAISA-B*和SUSTech1K等实验中,BigGait在自领域和跨领域任务中的表现明显优于以前的方法,并为学习下一代的gait表示提供了一个更实际的范例。最后,我们深入探讨了基于LVMs的gait识别中的潜在挑战和有前景的 direction,旨在激发未来在这个新兴领域的研究工作。源代码将在此链接中提供。
https://arxiv.org/abs/2402.19122
Gait, an unobtrusive biometric, is valued for its capability to identify individuals at a distance, across external outfits and environmental conditions. This study challenges the prevailing assumption that vision-based gait recognition, in particular skeleton-based gait recognition, relies primarily on motion patterns, revealing a significant role of the implicit anthropometric information encoded in the walking sequence. We show through a comparative analysis that removing height information leads to notable performance degradation across three models and two benchmarks (CASIA-B and GREW). Furthermore, we propose a spatial transformer model processing individual poses, disregarding any temporal information, which achieves unreasonably good accuracy, emphasizing the bias towards appearance information and indicating spurious correlations in existing benchmarks. These findings underscore the need for a nuanced understanding of the interplay between motion and appearance in vision-based gait recognition, prompting a reevaluation of the methodological assumptions in this field. Our experiments indicate that "in-the-wild" datasets are less prone to spurious correlations, prompting the need for more diverse and large scale datasets for advancing the field.
翻译:Gait,一种不显眼的生物识别技术,因其能够在距离、外部服装和环境条件下识别个体的能力而受到重视。这项研究挑战了普遍认为,视觉为基础的步态识别,特别是基于骨骼的步态识别,主要依赖于运动模式,揭示了在步行序列中编码的隐含人体测量信息的重要作用。我们通过比较分析展示了,去除身高信息会导致三种模型和两个基准(CASIA-B和GREW)的性能显著下降。此外,我们提出了一个忽略任何时间信息的空间转换器模型来处理个人动作,实现了前所未有的准确性,强调了面向外观信息的偏差,并指出了现有基准中的伪相关关系。这些发现强调了在视觉为基础的步态识别中,需要对运动和外观之间的相互作用进行深入的理解,这促使我们在该领域重新评估方法论假设。我们的实验表明,“野外”数据集不太容易受到伪相关关系的影响,因此需要更大、更多样化的数据集来推动该领域的进步。
https://arxiv.org/abs/2402.08320
Gait recognition is a promising biometric method that aims to identify pedestrians from their unique walking patterns. Silhouette modality, renowned for its easy acquisition, simple structure, sparse representation, and convenient modeling, has been widely employed in controlled in-the-lab research. However, as gait recognition rapidly advances from in-the-lab to in-the-wild scenarios, various conditions raise significant challenges for silhouette modality, including 1) unidentifiable low-quality silhouettes (abnormal segmentation, severe occlusion, or even non-human shape), and 2) identifiable but challenging silhouettes (background noise, non-standard posture, slight occlusion). To address these challenges, we revisit gait recognition pipeline and approach gait recognition from a quality perspective, namely QAGait. Specifically, we propose a series of cost-effective quality assessment strategies, including Maxmial Connect Area and Template Match to eliminate background noises and unidentifiable silhouettes, Alignment strategy to handle non-standard postures. We also propose two quality-aware loss functions to integrate silhouette quality into optimization within the embedding space. Extensive experiments demonstrate our QAGait can guarantee both gait reliability and performance enhancement. Furthermore, our quality assessment strategies can seamlessly integrate with existing gait datasets, showcasing our superiority. Code is available at this https URL.
行走识别是一种有前景的生物识别方法,旨在通过独特的行走模式识别行人。轮廓表示形式以其易得性、简单的结构、稀疏表示和方便建模而闻名,在实验室控制研究中被广泛应用。然而,随着行走识别从实验室环境迅速转移到现实环境,轮廓表示形式面临着一系列具有挑战性的条件,包括1)无法识别的低质量轮廓(异常分割、严重遮挡或甚至非人类形状),2)可以识别但具有挑战性的轮廓(背景噪声、不标准姿势、轻微遮挡)。为了应对这些挑战,我们重新审视了行走识别流程,并从质量角度出发进行行走识别,即QAGait。具体来说,我们提出了一系列具有成本效益的质评估策略,包括Maxmial Connect Area和模板匹配以消除背景噪声和无法识别的轮廓,以及Alignment策略来处理不标准的姿势。我们还提出了两个质量感知的损失函数,将轮廓质量整合到嵌入空间中的优化。大量实验证明,我们的QAGait可以确保行走的可靠性和性能提升。此外,我们的质评估策略可以无缝地整合到现有的行走数据集中,展示出我们优越的质量。代码可在此处访问:https://www. this URL。
https://arxiv.org/abs/2401.13531
Existing gait recognition benchmarks mostly include minor clothing variations in the laboratory environments, but lack persistent changes in appearance over time and space. In this paper, we propose the first in-the-wild benchmark CCGait for cloth-changing gait recognition, which incorporates diverse clothing changes, indoor and outdoor scenes, and multi-modal statistics over 92 days. To further address the coupling effect of clothing and viewpoint variations, we propose a hybrid approach HybridGait that exploits both temporal dynamics and the projected 2D information of 3D human meshes. Specifically, we introduce a Canonical Alignment Spatial-Temporal Transformer (CA-STT) module to encode human joint position-aware features, and fully exploit 3D dense priors via a Silhouette-guided Deformation with 3D-2D Appearance Projection (SilD) strategy. Our contributions are twofold: we provide a challenging benchmark CCGait that captures realistic appearance changes across an expanded and space, and we propose a hybrid framework HybridGait that outperforms prior works on CCGait and Gait3D benchmarks. Our project page is available at this https URL.
现有的步伐识别基准通常包括实验室环境中的轻微服装变化,但缺乏随时间和空间持续变化的视觉效果。在本文中,我们提出了第一个在野外的衣物更换的CCGait基准,该基准包括多样化的服装变化、室内和室外场景以及超过92天的多模态统计。为了进一步解决衣物和观点变化的影响,我们提出了HybridGait混合方法,该方法利用了时间和3D人体网格的投影2D信息。具体来说,我们引入了一个规范的alignment-spatial-temporalTransformer(CA-STT)模块来编码人类关节位置相关的特征,并完全利用通过3D-2D外观投影策略指导的轮廓引导变形。我们的贡献是双重的:我们提供一个挑战性的衣物更换基准,涵盖了扩展和空间的现实外观变化,并提出了一种混合框架HybridGait,在CCGait和Gait3D基准上超过了以前的工作。我们的项目页面可以通过这个链接获得。
https://arxiv.org/abs/2401.00271
Re-identifying participants in ultra-distance running competitions can be daunting due to the extensive distances and constantly changing terrain. To overcome these challenges, computer vision techniques have been developed to analyze runners' faces, numbers on their bibs, and clothing. However, our study presents a novel gait-based approach for runners' re-identification (re-ID) by leveraging various pre-trained human action recognition (HAR) models and loss functions. Our results show that this approach provides promising results for re-identifying runners in ultra-distance competitions. Furthermore, we investigate the significance of distinct human body movements when athletes are approaching their endurance limits and their potential impact on re-ID accuracy. Our study examines how the recognition of a runner's gait is affected by a competition's critical point (CP), defined as a moment of severe fatigue and the point where the finish line comes into view, just a few kilometers away from this location. We aim to determine how this CP can improve the accuracy of athlete re-ID. Our experimental results demonstrate that gait recognition can be significantly enhanced (up to a 9% increase in mAP) as athletes approach this point. This highlights the potential of utilizing gait recognition in real-world scenarios, such as ultra-distance competitions or long-duration surveillance tasks.
在超长距离跑步比赛中重新识别参与者可能会让人望而生畏,因为比赛距离广阔,地形不断变化。为克服这些挑战,计算机视觉技术被开发出来分析跑步者的面容、号码和服装。然而,我们的研究提出了一种新颖的基于步态的跑步者重新识别(RE-ID)方法,通过利用各种预训练的人动作识别(HAR)模型和损失函数。我们的研究结果表明,这种方法在超长距离比赛中识别跑步者具有相当大的潜力。此外,我们研究了当运动员接近其极限时,不同的人体运动对重新识别准确度的影响,以及这种影响如何影响跑步者步态识别的准确性。我们的研究探讨了运动员的关键点(CP),即疲劳程度严重时的时刻,距离终点仅有一两公里,此位置为CP。我们旨在确定此CP如何提高运动员重新识别的准确性。我们的实验结果表明,当运动员接近这个点时,步态识别可以显著增强(MAP增加9%)。这表明在现实场景中,如超长距离比赛或长时间监视任务,可以利用步态识别技术。
https://arxiv.org/abs/2401.00080
Gait recognition is a biometric technology that has received extensive attention. Most existing gait recognition algorithms are unimodal, and a few multimodal gait recognition algorithms perform multimodal fusion only once. None of these algorithms may fully exploit the complementary advantages of the multiple modalities. In this paper, by considering the temporal and spatial characteristics of gait data, we propose a multi-stage feature fusion strategy (MSFFS), which performs multimodal fusions at different stages in the feature extraction process. Also, we propose an adaptive feature fusion module (AFFM) that considers the semantic association between silhouettes and skeletons. The fusion process fuses different silhouette areas with their more related skeleton joints. Since visual appearance changes and time passage co-occur in a gait period, we propose a multiscale spatial-temporal feature extractor (MSSTFE) to learn the spatial-temporal linkage features thoroughly. Specifically, MSSTFE extracts and aggregates spatial-temporal linkages information at different spatial scales. Combining the strategy and modules mentioned above, we propose a multi-stage adaptive feature fusion (MSAFF) neural network, which shows state-of-the-art performance in many experiments on three datasets. Besides, MSAFF is equipped with feature dimensional pooling (FD Pooling), which can significantly reduce the dimension of the gait representations without hindering the accuracy. this https URL
翻译:步态识别是一种生物识别技术,已受到广泛关注。现有的步态识别算法大多是单模态的,而一些多模态步态识别算法仅在特征提取过程中执行一次多模态融合。这些算法可能没有充分利用多个模态的互补优势。在本文中,我们通过考虑步态数据的时序和空间特性,提出了一个多级特征融合策略(MSFFS),在特征提取过程的不同阶段执行多模态融合。此外,我们还提出了一个自适应特征融合模块(AFFM),考虑了轮廓和骨架之间的语义关联。融合过程将不同轮廓区域的特征与它们更相关的骨架关节融合在一起。由于在步行周期中视觉外观变化和时间流逝是同时发生的,我们提出了一个多尺度空间时元特征提取器(MSSTFE),以深入学习空间时元特征。具体来说,MSSTFE在不同的空间尺度上提取和聚合空间时元连接信息。结合上述策略和模块,我们提出了一个多级自适应特征融合(MSAFF)神经网络,在多个数据集上的实验表现均达到了最先进的水平。此外,MSAFF配备了特征维度池化(FD Pooling)模块,这可以在不降低准确性的情况下显著减少步行表示的维度。
https://arxiv.org/abs/2312.14410
Gait datasets are essential for gait research. However, this paper observes that present benchmarks, whether conventional constrained or emerging real-world datasets, fall short regarding covariate diversity. To bridge this gap, we undertake an arduous 20-month effort to collect a cross-covariate gait recognition (CCGR) dataset. The CCGR dataset has 970 subjects and about 1.6 million sequences; almost every subject has 33 views and 53 different covariates. Compared to existing datasets, CCGR has both population and individual-level diversity. In addition, the views and covariates are well labeled, enabling the analysis of the effects of different factors. CCGR provides multiple types of gait data, including RGB, parsing, silhouette, and pose, offering researchers a comprehensive resource for exploration. In order to delve deeper into addressing cross-covariate gait recognition, we propose parsing-based gait recognition (ParsingGait) by utilizing the newly proposed parsing data. We have conducted extensive experiments. Our main results show: 1) Cross-covariate emerges as a pivotal challenge for practical applications of gait recognition. 2) ParsingGait demonstrates remarkable potential for further advancement. 3) Alarmingly, existing SOTA methods achieve less than 43% accuracy on the CCGR, highlighting the urgency of exploring cross-covariate gait recognition. Link: this https URL.
翻译: 步态数据对于步态研究至关重要。然而,本文观察到,无论是传统的约束数据还是新兴的实时现实数据,都与协方差多样性存在不足。为了填补这一空白,我们进行了长达20个月的艰苦努力,收集了一个跨协方差步态识别(CCGR)数据集。CCGR数据集包括970个受试者,大约1600万条序列;几乎每个受试者都有33个视图和53个不同的协方差。与现有数据集相比,CCGR在人口水平和个体水平上具有多样性。此外,视图和协方差都有良好的标注,使得研究人员可以分析不同因素的影响。CCGR提供了多种类型的步态数据,包括RGB、解析、细枝和姿态,为研究人员提供了一个全面的资源。为了更深入地研究跨协方差步态识别,我们提出了基于解析的步态识别(ParsingGait)方法,利用新提出的解析数据。我们进行了广泛的实验。我们的主要结果表明:1)跨协方差步态识别成为 practical applications of gait recognition的一个关键挑战。 2)ParsingGait 展示了进一步发展的显著潜力。3)令人担忧的是,现有 SOTA 方法在 CCGR 上 achieves less than 43% 的准确率,凸显了探索跨协方差步态识别的紧迫性。链接:https://this URL。
https://arxiv.org/abs/2312.14404
While gait recognition has seen many advances in recent years, the occlusion problem has largely been ignored. This problem is especially important for gait recognition from uncontrolled outdoor sequences at range - since any small obstruction can affect the recognition system. Most current methods assume the availability of complete body information while extracting the gait features. When parts of the body are occluded, these methods may hallucinate and output a corrupted gait signature as they try to look for body parts which are not present in the input at all. To address this, we exploit the learned occlusion type while extracting identity features from videos. Thus, in this work, we propose an occlusion aware gait recognition method which can be used to model intrinsic occlusion awareness into potentially any state-of-the-art gait recognition method. Our experiments on the challenging GREW and BRIAR datasets show that networks enhanced with this occlusion awareness perform better at recognition tasks than their counterparts trained on similar occlusions.
虽然最近几年在步态识别方面取得了许多进展,但遮挡问题却被大大忽视了。对于未受控的户外序列中的步态识别,这个问题尤为重要,因为任何小的遮挡都可能影响识别系统。大多数现有方法在提取步态特征时假定具有完整的身体信息。当身体部分被遮挡时,这些方法可能会出现幻觉,并输出一个污染的步态签名,因为他们试图寻找在输入中根本不存在的身体部位。为了解决这个问题,我们利用学习到的遮挡类型来提取身份特征。因此,在本文中,我们提出了一个具有遮挡意识的步态识别方法,可以用于将内在遮挡意识建模为可能实现任何最先进的步态识别方法的状态。我们对具有挑战性的GREW和BRIAR数据集的实验结果表明,与训练在类似遮挡条件下的网络相比,具有遮挡意识的网络在识别任务上表现更好。
https://arxiv.org/abs/2312.02290
Gait recognition holds the promise to robustly identify subjects based on walking patterns instead of appearance information. In recent years, this field has been dominated by learning methods based on two principal input representations: dense silhouette masks or sparse pose keypoints. In this work, we propose a novel, point-based Contour-Pose representation, which compactly expresses both body shape and body parts information. We further propose a local-to-global architecture, called GaitContour, to leverage this novel representation and efficiently compute subject embedding in two stages. The first stage consists of a local transformer that extracts features from five different body regions. The second stage then aggregates the regional features to estimate a global human gait representation. Such a design significantly reduces the complexity of the attention operation and improves efficiency and performance simultaneously. Through large scale experiments, GaitContour is shown to perform significantly better than previous point-based methods, while also being significantly more efficient than silhouette-based methods. On challenging datasets with significant distractors, GaitContour can even outperform silhouette-based methods.
基于行走模式的识别具有识别基于步行模式而非外观信息的受试者的潜力。在过去的几年里,这个领域主导着基于两个主要输入表示的学习方法:密集轮廓掩码或稀疏姿态关键点。在这项工作中,我们提出了一个新颖的基于点的轮廓-姿态表示,该表示既包含了身体形状信息,又包含了身体部位信息。我们进一步提出了一个局部到全局架构,称为GaitContour,以利用这个新颖表示并高效地计算受试者嵌入。第一阶段包括从五个不同的身体区域提取特征的局部Transformer。第二阶段然后对区域特征进行聚合,以估计全局人类步态表示。这种设计显著减少了注意操作的复杂性,同时提高了效率和性能。通过大型的实验,GaitContour证明了比之前基于点的方法显著更好的性能,同时比基于轮廓的方法更高效。在具有巨大干扰者的具有挑战性的数据集中,GaitContour甚至超过了基于轮廓的方法。
https://arxiv.org/abs/2311.16497
The choice of the representations is essential for deep gait recognition methods. The binary silhouettes and skeletal coordinates are two dominant representations in recent literature, achieving remarkable advances in many scenarios. However, inherent challenges remain, in which silhouettes are not always guaranteed in unconstrained scenes, and structural cues have not been fully utilized from skeletons. In this paper, we introduce a novel skeletal gait representation named Skeleton Map, together with SkeletonGait, a skeleton-based method to exploit structural information from human skeleton maps. Specifically, the skeleton map represents the coordinates of human joints as a heatmap with Gaussian approximation, exhibiting a silhouette-like image devoid of exact body structure. Beyond achieving state-of-the-art performances over five popular gait datasets, more importantly, SkeletonGait uncovers novel insights about how important structural features are in describing gait and when do they play a role. Furthermore, we propose a multi-branch architecture, named SkeletonGait++, to make use of complementary features from both skeletons and silhouettes. Experiments indicate that SkeletonGait++ outperforms existing state-of-the-art methods by a significant margin in various scenarios. For instance, it achieves an impressive rank-1 accuracy of over $85\%$ on the challenging GREW dataset. All the source code will be available at this https URL.
选择表示方式对深度步态识别方法至关重要。二值轮廓和骨架坐标是最近文献中两种主导表示方式,在许多场景中取得了显著的进步。然而,仍然存在一些固有挑战,其中轮廓并不总是保证在约束场景中,从骨架中也没有完全利用到结构信息。在本文中,我们提出了一个新颖的骨架步态表示名为骨架映射,以及一个基于骨架的人体骨架图挖掘方法SkeletonGait。具体来说,骨架映射用高斯近似的二维图像表示人类关节的坐标,呈现出类似轮廓的图像,缺乏精确的身体结构。除了在五个流行的步态数据集上实现最先进的性能,更重要的是,SkeletonGait揭示了描述步态和何时结构特征发挥作用的新见解。此外,我们提出了一个多分支架构,名为SkeletonGait++,以利用骨架和轮廓的互补特征。实验表明,SkeletonGait++在各种场景中都显著优于现有最先进的方法。例如,它在具有挑战性的GREW数据集上实现了令人印象深刻的排名前1%的准确性。所有源代码都将公开发布在这个https URL上。
https://arxiv.org/abs/2311.13444
Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, we present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition. HiH features a main branch that utilizes Hierarchical Gait Decomposer (HGD) modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data. This approach captures motion hierarchies from overall body dynamics to detailed limb movements, facilitating the representation of gait attributes across multiple spatial resolutions. Complementing this, an auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis. It employs a Deformable Spatial Enhancement (DSE) module for pose-guided spatial attention and a Deformable Temporal Alignment (DTA) module for aligning motion dynamics through learned temporal offsets. Extensive evaluations across diverse indoor and outdoor datasets demonstrate HiH's state-of-the-art performance, affirming a well-balanced trade-off between accuracy and efficiency.
翻译: 平衡设置中,平衡计取得了进展,但在无约束的环境中,由于诸如视野变化、遮挡和不同行走速度等问题的存在,它 significantly 挣扎。此外,由于跨模态不兼容,将多个模式融合的努力通常面临有限的改进,特别是在户外场景中。为解决这些问题,我们提出了一个多模态层次结构层次网络(HiH),该网络整合了轮廓和姿态序列以实现稳健的步态识别。HiH 具有主分支和辅助分支。主分支利用分层步态分解器(HGD)模块对轮廓数据进行深度和内部模块层次检查,以捕捉整体身体动态到详细肢体运动的运动层次结构。这种方法从总体身体动态到详细肢体运动捕捉运动层次结构,从而在多个空间分辨率上表示步态属性。补充的是,辅助分支基于二维关节序列,丰富了步态分析的时空方面。它采用了一个可塑的空间增强(DSE)模块进行姿态引导的空间关注,和一个可塑的时间对齐(DTA)模块通过学习到的时间偏移来对运动动态进行对齐。在多样室内和室外数据集上进行广泛的评估证明HiH 实现了最先进的性能,确实实现了准确性和效率之间的良好平衡。
https://arxiv.org/abs/2311.11210
Human silhouette extraction is a fundamental task in computer vision with applications in various downstream tasks. However, occlusions pose a significant challenge, leading to incomplete and distorted silhouettes. To address this challenge, we introduce POISE: Pose Guided Human Silhouette Extraction under Occlusions, a novel self-supervised fusion framework that enhances accuracy and robustness in human silhouette prediction. By combining initial silhouette estimates from a segmentation model with human joint predictions from a 2D pose estimation model, POISE leverages the complementary strengths of both approaches, effectively integrating precise body shape information and spatial information to tackle occlusions. Furthermore, the self-supervised nature of \POISE eliminates the need for costly annotations, making it scalable and practical. Extensive experimental results demonstrate its superiority in improving silhouette extraction under occlusions, with promising results in downstream tasks such as gait recognition. The code for our method is available this https URL.
人体轮廓提取是计算机视觉中一个基本任务,在各种下游任务中有应用。然而,遮挡带来的挑战相当大,导致轮廓不完整和扭曲。为解决这个问题,我们引入了POISE:在遮挡下的人体轮廓提取,一种新颖的自监督融合框架,可以提高人体轮廓预测的准确性和鲁棒性。通过将来自分割模型的初始轮廓估计与来自2D姿势估计模型的人体关节预测相结合,POISE有效地利用了两种方法的互补优势,将精确的身体形状信息和空间信息结合起来解决遮挡问题。此外,自监督的 nature of POISE 消除了需要昂贵注释的需求,使得它具有可扩展性和实用性。大量的实验结果表明,在遮挡条件下,POISE在改善轮廓提取方面具有优越性,同时在下游任务(如步态识别)中取得了有益的结果。我们的方法的代码可以从https://URL中获取。
https://arxiv.org/abs/2311.05077
Gait analysis leverages unique walking patterns for person identification and assessment across multiple domains. Among the methods used for gait analysis, skeleton-based approaches have shown promise due to their robust and interpretable features. However, these methods often rely on hand-crafted spatial-temporal graphs that are based on human anatomy disregarding the particularities of the dataset and task. This paper proposes a novel method to simplify the spatial-temporal graph representation for gait-based gender estimation, improving interpretability without losing performance. Our approach employs two models, an upstream and a downstream model, that can adjust the adjacency matrix for each walking instance, thereby removing the fixed nature of the graph. By employing the Straight-Through Gumbel-Softmax trick, our model is trainable end-to-end. We demonstrate the effectiveness of our approach on the CASIA-B dataset for gait-based gender estimation. The resulting graphs are interpretable and differ qualitatively from fixed graphs used in existing models. Our research contributes to enhancing the explainability and task-specific adaptability of gait recognition, promoting more efficient and reliable gait-based biometrics.
步态分析利用独特的步态模式跨多个领域进行人员识别和评估。在步态分析中使用的方法中,基于骨骼的方法表现出了很大的潜力,因为它们具有稳健和可解释的特点。然而,这些方法通常依赖于手工构建的空间和时间图形,这些图形是基于人类解剖学而不考虑数据集和任务特定性的。本文提出了一种新方法,以简化基于步态性别估计的空间和时间图形表示,同时提高解释性,而不会丢失性能。我们的方法和两个模型一起工作,一个向前,一个向后,可以调整每个行走实例的相邻矩阵,从而消除图形的固定性质。通过使用直穿 Gumbel-Softmax技巧,我们的模型可以 end-to-end 训练。我们使用 CASIA-B 数据集展示了我们方法的有效性, resulting 图形具有可解释性,与当前模型使用的固定图形 qualitative 不同。我们的研究有助于增强解释性和任务特定适应性的步态识别,促进更高效和可靠的基于步态的生物特征。
https://arxiv.org/abs/2310.03396
Most current gait recognition methods suffer from poor interpretability and high computational cost. To improve interpretability, we investigate gait features in the embedding space based on Koopman operator theory. The transition matrix in this space captures complex kinematic features of gait cycles, namely the Koopman operator. The diagonal elements of the operator matrix can represent the overall motion trend, providing a physically meaningful descriptor. To reduce the computational cost of our algorithm, we use a reversible autoencoder to reduce the model size and eliminate convolutional layers to compress its depth, resulting in fewer floating-point operations. Experimental results on multiple datasets show that our method reduces computational cost to 1% compared to state-of-the-art methods while achieving competitive recognition accuracy 98% on non-occlusion datasets.
目前的步进识别方法通常存在 poor interpretability 和 high computational cost 的问题,为了改善 interpretability,我们基于 Koopman 操作理论研究了步进特征在嵌入空间中的表示。在这个空间中,过渡矩阵捕获了步进周期中的复杂运动特征,即 Koopman 操作。操作矩阵的对角元素可以表示整个运动趋势,提供了具有物理意义的描述符。为了降低算法的计算成本,我们使用可逆自编码器减少模型大小,消除卷积层以压缩深度,从而减少了浮点操作。多个数据集的实验结果显示,与我们最先进的方法相比,我们的算法将计算成本降低到 1% 以下,而在包含遮挡数据集上的竞争性识别准确率达到 98%。
https://arxiv.org/abs/2309.14764
Gait recognition (GR) is a growing biometric modality used for person identification from a distance through visual cameras. GR provides a secure and reliable alternative to fingerprint and face recognition, as it is harder to distinguish between false and authentic signals. Furthermore, its resistance to spoofing makes GR suitable for all types of environments. With the rise of deep learning, steadily improving strides have been made in GR technology with promising results in various contexts. As video surveillance becomes more prevalent, new obstacles arise, such as ensuring uniform performance evaluation across different protocols, reliable recognition despite shifting lighting conditions, fluctuations in gait patterns, and protecting privacy.This survey aims to give an overview of GR and analyze the environmental elements and complications that could affect it in comparison to other biometric recognition systems. The primary goal is to examine the existing deep learning (DL) techniques employed for human GR that may generate new research opportunities.
步识别(GR)是一种正在增长的生物特征识别方式,通过视觉摄像机用于远距离人员身份识别。GR提供了指纹和面部识别的可靠和安全替代品,因为更难区分虚假和真实信号。此外,它的抗伪造能力使GR适用于各种环境。随着深度学习的兴起,GR技术稳步前进,在各种情况下取得了令人瞩目的成果。随着视频监控越来越普遍,出现了新的问题,例如确保不同协议下一致的性能评估、即使在不同照明条件下也能可靠识别、步态模式的不规则变化以及保护隐私。本调查旨在提供一个概述GR的情况,并分析与环境元素和复杂性相比可能对其产生影响的其他生物特征识别系统。其主要目标是审查现有的人类步识别(GR)技术,可能为新的研究机会提供支持。
https://arxiv.org/abs/2309.10144