Paper Reading AI Learner

It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment

2024-11-16 08:54:27
Jinkai Zheng, Xinchen Liu, Boyue Zhang, Chenggang Yan, Jiyong Zhang, Wu Liu, Yongdong Zhang

Abstract

Existing studies for gait recognition primarily utilized sequences of either binary silhouette or human parsing to encode the shapes and dynamics of persons during walking. Silhouettes exhibit accurate segmentation quality and robustness to environmental variations, but their low information entropy may result in sub-optimal performance. In contrast, human parsing provides fine-grained part segmentation with higher information entropy, but the segmentation quality may deteriorate due to the complex environments. To discover the advantages of silhouette and parsing and overcome their limitations, this paper proposes a novel cross-granularity alignment gait recognition method, named XGait, to unleash the power of gait representations of different granularity. To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces, respectively. Moreover, to explore the complementary knowledge across the features of two representations, we design the Global Cross-granularity Module (GCM) and the Part Cross-granularity Module (PCM) after the two encoders. In particular, the GCM aims to enhance the quality of parsing features by leveraging global features from silhouettes, while the PCM aligns the dynamics of human parts between silhouette and parsing features using the high information entropy in parsing sequences. In addition, to effectively guide the alignment of two representations with different granularity at the part level, an elaborate-designed learnable division mechanism is proposed for the parsing features. Comprehensive experiments on two large-scale gait datasets not only show the superior performance of XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG but also reflect the robustness of the learned features even under challenging conditions like occlusions and cloth changes.

Abstract (translated)

现有的步态识别研究主要利用二值轮廓序列或人体分割序列来编码行走过程中的人物形状和动态。轮廓表现出准确的分割质量和对环境变化的强大鲁棒性,但其低信息熵可能导致性能不佳。相比之下,人体解析提供了更高信息熵的细粒度部分分割,但由于复杂环境的影响,分割质量可能会下降。为了发现轮廓和解析的优势并克服它们的局限性,本文提出了一种新颖的跨粒度步态识别方法,名为XGait,以释放不同粒度下的步态表示力。为实现这一目标,XGait首先包含两个骨干编码器分支,分别将轮廓序列和解析序列映射到两个潜在空间中。此外,为了探索两种表示特征之间的互补知识,在两个编码器之后设计了全局跨粒度模块(GCM)和部分跨粒度模块(PCM)。特别是,GCM旨在通过利用来自轮廓的全局特征来增强解析特征的质量,而PCM则使用解析序列中的高信息熵对轮廓与解析特征之间的人体部位动态进行对齐。此外,为了在部分级别上有效地指导两种不同粒度表示的对齐,提出了一个精心设计的学习分割机制用于解析特征。在两个大规模步态数据集上的综合实验不仅展示了XGait以80.5%的Rank-1准确率在Gait3D和88.3%CCPG中的优越性能,而且还反映了学习到的特征即使在遮挡和衣物变化等具有挑战性的条件下也具备鲁棒性。

URL

https://arxiv.org/abs/2411.10742

PDF

https://arxiv.org/pdf/2411.10742.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot