Paper Reading AI Learner

GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation

2023-11-27 17:06:25
Yuxiang Guo, Anshul Shah, Jiang Liu, Rama Chellappa, Cheng Peng

Abstract

Gait recognition holds the promise to robustly identify subjects based on walking patterns instead of appearance information. In recent years, this field has been dominated by learning methods based on two principal input representations: dense silhouette masks or sparse pose keypoints. In this work, we propose a novel, point-based Contour-Pose representation, which compactly expresses both body shape and body parts information. We further propose a local-to-global architecture, called GaitContour, to leverage this novel representation and efficiently compute subject embedding in two stages. The first stage consists of a local transformer that extracts features from five different body regions. The second stage then aggregates the regional features to estimate a global human gait representation. Such a design significantly reduces the complexity of the attention operation and improves efficiency and performance simultaneously. Through large scale experiments, GaitContour is shown to perform significantly better than previous point-based methods, while also being significantly more efficient than silhouette-based methods. On challenging datasets with significant distractors, GaitContour can even outperform silhouette-based methods.

Abstract (translated)

基于行走模式的识别具有识别基于步行模式而非外观信息的受试者的潜力。在过去的几年里,这个领域主导着基于两个主要输入表示的学习方法:密集轮廓掩码或稀疏姿态关键点。在这项工作中,我们提出了一个新颖的基于点的轮廓-姿态表示,该表示既包含了身体形状信息,又包含了身体部位信息。我们进一步提出了一个局部到全局架构,称为GaitContour,以利用这个新颖表示并高效地计算受试者嵌入。第一阶段包括从五个不同的身体区域提取特征的局部Transformer。第二阶段然后对区域特征进行聚合,以估计全局人类步态表示。这种设计显著减少了注意操作的复杂性,同时提高了效率和性能。通过大型的实验,GaitContour证明了比之前基于点的方法显著更好的性能,同时比基于轮廓的方法更高效。在具有巨大干扰者的具有挑战性的数据集中,GaitContour甚至超过了基于轮廓的方法。

URL

https://arxiv.org/abs/2311.16497

PDF

https://arxiv.org/pdf/2311.16497.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot