Paper Reading AI Learner

Hierarchical Spatio-Temporal Representation Learning for Gait Recognition

2023-07-19 09:30:00
Lei Wang, Bo Liu, Fangfang Liang, Bincheng Wang

Abstract

Gait recognition is a biometric technique that identifies individuals by their unique walking styles, which is suitable for unconstrained environments and has a wide range of applications. While current methods focus on exploiting body part-based representations, they often neglect the hierarchical dependencies between local motion patterns. In this paper, we propose a hierarchical spatio-temporal representation learning (HSTL) framework for extracting gait features from coarse to fine. Our framework starts with a hierarchical clustering analysis to recover multi-level body structures from the whole body to local details. Next, an adaptive region-based motion extractor (ARME) is designed to learn region-independent motion features. The proposed HSTL then stacks multiple ARMEs in a top-down manner, with each ARME corresponding to a specific partition level of the hierarchy. An adaptive spatio-temporal pooling (ASTP) module is used to capture gait features at different levels of detail to perform hierarchical feature mapping. Finally, a frame-level temporal aggregation (FTA) module is employed to reduce redundant information in gait sequences through multi-scale temporal downsampling. Extensive experiments on CASIA-B, OUMVLP, GREW, and Gait3D datasets demonstrate that our method outperforms the state-of-the-art while maintaining a reasonable balance between model accuracy and complexity.

Abstract (translated)

步态识别是一种生物特征技术,通过独特的步态风格识别个人,适用于没有限制的环境,并拥有广泛的应用程序。当前的方法主要关注利用身体部分表示,但往往忽略了 local 运动模式之间的层级依赖性。在本文中,我们提出了一种分层时间表示学习框架(HSTL),以从粗到细提取步态特征。我们的框架从分层聚类分析开始,以恢复整个身体的多层次身体结构,以及 local 细节的特定分区级别。接下来,我们设计了自适应区域运动提取器(ARME),以学习区域独立的运动特征。 proposed HSTL 然后将多个 ARME 按上式堆叠,每个 ARME 对应于层级的特定分区级别。自适应时间和空间汇聚(ASTP)模块被用于捕获不同层次的步态特征,以进行层级特征映射。最后,Frame 级别的时间聚合(FTA)模块被使用通过多尺度时间削减来减少步态序列中的冗余信息。在 CASIA-B、OUMVLP、GREW 和步态3D数据集上的广泛实验表明,我们的方法优于当前最先进的方法,同时保持模型精度和复杂性的合理平衡。

URL

https://arxiv.org/abs/2307.09856

PDF

https://arxiv.org/pdf/2307.09856.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot