Paper Reading AI Learner

The Paradox of Motion: Evidence for Spurious Correlations in Skeleton-based Gait Recognition Models

2024-02-13 09:33:12
Andy Cătrună, Adrian Cosma, Emilian Rădoi

Abstract

Gait, an unobtrusive biometric, is valued for its capability to identify individuals at a distance, across external outfits and environmental conditions. This study challenges the prevailing assumption that vision-based gait recognition, in particular skeleton-based gait recognition, relies primarily on motion patterns, revealing a significant role of the implicit anthropometric information encoded in the walking sequence. We show through a comparative analysis that removing height information leads to notable performance degradation across three models and two benchmarks (CASIA-B and GREW). Furthermore, we propose a spatial transformer model processing individual poses, disregarding any temporal information, which achieves unreasonably good accuracy, emphasizing the bias towards appearance information and indicating spurious correlations in existing benchmarks. These findings underscore the need for a nuanced understanding of the interplay between motion and appearance in vision-based gait recognition, prompting a reevaluation of the methodological assumptions in this field. Our experiments indicate that "in-the-wild" datasets are less prone to spurious correlations, prompting the need for more diverse and large scale datasets for advancing the field.

Abstract (translated)

翻译:Gait,一种不显眼的生物识别技术,因其能够在距离、外部服装和环境条件下识别个体的能力而受到重视。这项研究挑战了普遍认为,视觉为基础的步态识别,特别是基于骨骼的步态识别,主要依赖于运动模式,揭示了在步行序列中编码的隐含人体测量信息的重要作用。我们通过比较分析展示了,去除身高信息会导致三种模型和两个基准(CASIA-B和GREW)的性能显著下降。此外,我们提出了一个忽略任何时间信息的空间转换器模型来处理个人动作,实现了前所未有的准确性,强调了面向外观信息的偏差,并指出了现有基准中的伪相关关系。这些发现强调了在视觉为基础的步态识别中,需要对运动和外观之间的相互作用进行深入的理解,这促使我们在该领域重新评估方法论假设。我们的实验表明,“野外”数据集不太容易受到伪相关关系的影响,因此需要更大、更多样化的数据集来推动该领域的进步。

URL

https://arxiv.org/abs/2402.08320

PDF

https://arxiv.org/pdf/2402.08320.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot