Paper Reading AI Learner

Edges Are All You Need: Robust Gait Recognition via Label-Free Structure

2026-03-04 05:48:07
Chao Zhang, Zhuang Zheng, Ruixin Li, Zhanyong Mei

Abstract

Gait recognition is a non-intrusive biometric technique for security applications, yet existing studies are dominated by silhouette- and parsing-based representations. Silhouettes are sparse and miss internal structural details, limiting discriminability. Parsing enriches silhouettes with part-level structures, but relies heavily on upstream human parsers (e.g., label granularity and boundary precision), leading to unstable performance across datasets and sometimes even inferior results to silhouettes. We revisit gait representations from a structural perspective and describe a design space defined by edge density and supervision form: silhouettes use sparse boundary edges with weak single-label supervision, while parsing uses denser cues with strong semantic priors. In this space, we identify an underexplored paradigm: dense part-level structure without explicit semantic labels, and introduce SKETCH as a new visual modality for gait recognition. Sketch extracts high-frequency structural cues (e.g., limb articulations and self-occlusion contours) directly from RGB images via edge-based detectors in a label-free manner. We further show that label-guided parsing and label-free sketch are semantically decoupled and structurally complementary. Based on this, we propose SKETCHGAIT, a hierarchically disentangled multi-modal framework with two independent streams for modality-specific learning and a lightweight early-stage fusion branch to capture structural complementarity. Extensive experiments on SUSTech1K and CCPG validate the proposed modality and framework: SketchGait achieves 92.9% Rank-1 on SUSTech1K and 93.1% mean Rank-1 on CCPG.

Abstract (translated)

步态识别是一种用于安全应用的非侵入性生物特征技术,但现有的研究主要集中在轮廓和解析表示上。轮廓是稀疏的,并且缺少内部结构细节,这限制了其鉴别能力;而解析则通过引入部分级结构来丰富轮廓,不过它严重依赖于上游的人体解析器(例如标签粒度和边界精度),导致在不同数据集上的表现不稳定,在某些情况下甚至比单纯的轮廓识别效果更差。我们从结构角度重新审视步态表示,并定义了一个由边缘密度和监督形式构成的设计空间:轮廓使用稀疏的边界边缘并带有弱单标签监督,而解析则利用更密集的线索以及强大的语义先验。在这个设计空间中,我们发现了一种被忽视的方法:没有显式的语义标签但具有密集的部分级结构,并引入SKETCH作为一种新的步态识别视觉模式。通过基于边缘检测器从RGB图像直接提取高频结构线索(例如肢体关节和自我遮挡轮廓),Sketch在无标签的情况下工作。我们进一步证明,引导式解析与非引导式草图在语义上是解耦的,在结构上是互补的。基于此,我们提出了SKETCHGAIT,这是一个分层解缠的多模态框架,包括两个独立的学习流和一个轻量级的早期融合分支来捕捉结构上的互补性。在SUSTech1K和CCPG数据集上的广泛实验验证了所提出的模式和框架的有效性:SketchGait在SUSTech1K上达到了92.9%的第一名准确率,并且在CCPG上实现了93.1%的平均第一名准确率。

URL

https://arxiv.org/abs/2603.05537

PDF

https://arxiv.org/pdf/2603.05537.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot