Paper Reading AI Learner

GaitSTR: Gait Recognition with Sequential Two-stream Refinement

2024-04-02 22:39:35
Wanrong Zheng, Haidong Zhu, Zhaoheng Zheng, Ram Nevatia


Gait recognition aims to identify a person based on their walking sequences, serving as a useful biometric modality as it can be observed from long distances without requiring cooperation from the subject. In representing a person's walking sequence, silhouettes and skeletons are the two primary modalities used. Silhouette sequences lack detailed part information when overlapping occurs between different body segments and are affected by carried objects and clothing. Skeletons, comprising joints and bones connecting the joints, provide more accurate part information for different segments; however, they are sensitive to occlusions and low-quality images, causing inconsistencies in frame-wise results within a sequence. In this paper, we explore the use of a two-stream representation of skeletons for gait recognition, alongside silhouettes. By fusing the combined data of silhouettes and skeletons, we refine the two-stream skeletons, joints, and bones through self-correction in graph convolution, along with cross-modal correction with temporal consistency from silhouettes. We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.

Abstract (translated)

翻译: 步态识别的目的是根据一个人的行走序列来识别这个人,作为一个有用的生物测量指标,因为它可以从很远的距离上通过合作观察到,而不需要对被测者进行合作。在表示一个人的行走序列时,轮廓和骨架是两种主要的模式。轮廓序列在多个身体部位之间发生重叠时缺乏详细的部分信息,并受到携带物品和服装的影响。骨架,由连接关节的关节和骨头组成,提供不同部位更准确的部分信息;然而,它们对遮挡和低质量图像敏感,导致序列中的帧结果不一致。在本文中,我们探讨了使用骨架的双流表示方法进行步态识别,同时使用轮廓。通过将轮廓和骨架的合并数据进行融合,我们通过自校正的图卷积和对时一致的跨模态校正在轮廓和骨架上进行优化。我们证明了,通过优化骨架,步态识别模型的性能可以在没有额外注释的公共步态识别数据集上实现比最先进方法更进一步的改进。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot