Paper Reading AI Learner

MetaGait: Learning to Learn an Omni Sample Adaptive Representation for Gait Recognition

2023-06-06 06:53:05
Huanzhang Dou, Pengyi Zhang, Wei Su, Yunlong Yu, Xi Li

Abstract

Gait recognition, which aims at identifying individuals by their walking patterns, has recently drawn increasing research attention. However, gait recognition still suffers from the conflicts between the limited binary visual clues of the silhouette and numerous covariates with diverse scales, which brings challenges to the model's adaptiveness. In this paper, we address this conflict by developing a novel MetaGait that learns to learn an omni sample adaptive representation. Towards this goal, MetaGait injects meta-knowledge, which could guide the model to perceive sample-specific properties, into the calibration network of the attention mechanism to improve the adaptiveness from the omni-scale, omni-dimension, and omni-process perspectives. Specifically, we leverage the meta-knowledge across the entire process, where Meta Triple Attention and Meta Temporal Pooling are presented respectively to adaptively capture omni-scale dependency from spatial/channel/temporal dimensions simultaneously and to adaptively aggregate temporal information through integrating the merits of three complementary temporal aggregation methods. Extensive experiments demonstrate the state-of-the-art performance of the proposed MetaGait. On CASIA-B, we achieve rank-1 accuracy of 98.7%, 96.0%, and 89.3% under three conditions, respectively. On OU-MVLP, we achieve rank-1 accuracy of 92.4%.

Abstract (translated)

步态识别旨在通过其行走模式识别个人,但该方法仍面临着轮廓有限二进制视觉线索和众多不同尺度的协变量之间的冲突,这给模型的适应性带来了挑战。在本文中,我们旨在解决这一冲突,开发了一种新形式的 MetaGait,该模型学习学习多样本自适应表示。为实现这一目标,MetaGait 在注意力机制的校准网络中注入meta-知识,以指导模型感知样本特定的性质,以提高从多尺度、多维度和多进程的视角的适应性。具体来说,我们在整个过程中利用meta-知识,其中 Meta tripleAttention 和 Meta Temporal Pooling分别用于自适应地捕捉多尺度依赖从空间/通道/时间维度同时捕捉,并自适应地聚合时间信息,通过整合三个互补的时间聚合方法的优点。广泛实验展示了所提出的 MetaGait 的最新性能。在 CASIA-B 测试中,我们分别在不同的条件下实现了排名1的准确性98.7%、96.0%和89.3%。在 OU-MVLP 测试中,我们实现了排名1的准确性92.4%。

URL

https://arxiv.org/abs/2306.03445

PDF

https://arxiv.org/pdf/2306.03445.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot