Paper Reading AI Learner

Reducing Training Demands for 3D Gait Recognition with Deep Koopman Operator Constraints

2023-08-14 21:39:33
Cole Hill, Mauricio Pamplona Segundo, Sudeep Sarkar

Abstract

Deep learning research has made many biometric recognition solution viable, but it requires vast training data to achieve real-world generalization. Unlike other biometric traits, such as face and ear, gait samples cannot be easily crawled from the web to form massive unconstrained datasets. As the human body has been extensively studied for different digital applications, one can rely on prior shape knowledge to overcome data scarcity. This work follows the recent trend of fitting a 3D deformable body model into gait videos using deep neural networks to obtain disentangled shape and pose representations for each frame. To enforce temporal consistency in the network, we introduce a new Linear Dynamical Systems (LDS) module and loss based on Koopman operator theory, which provides an unsupervised motion regularization for the periodic nature of gait, as well as a predictive capacity for extending gait sequences. We compare LDS to the traditional adversarial training approach and use the USF HumanID and CASIA-B datasets to show that LDS can obtain better accuracy with less training data. Finally, we also show that our 3D modeling approach is much better than other 3D gait approaches in overcoming viewpoint variation under normal, bag-carrying and clothing change conditions.

Abstract (translated)

深度学习研究已经使许多生物特征识别解决方案成为可能,但要实现现实世界的泛化,需要大量的训练数据。与面部和耳朵等生物特征不同,步态样本难以从网络上爬取,以形成大量没有限制的dataset。由于人类身体已经被广泛应用于各种数字应用中,可以依靠先前的形状知识来克服数据稀缺的问题。这项工作遵循了最近的趋势,使用深度神经网络将3D可编辑身体模型嵌入步态视频,以获得每个帧的分离形状和姿态表示。为了在网络中实现时间一致性,我们引入了新的线性动态系统(LDS)模块,并基于 Koopman 操作理论计算损失,该损失为步态的周期性性质提供了 unsupervised 的运动Regularization,并为提高步态序列预测能力提供了预测能力。我们比较了 LDS 与传统对抗训练方法,并使用 USF 人类ID 和 CASIA-B 数据集证明了 LDS 在更少训练数据的情况下能够获得更好的准确性。最后,我们还展示了我们的3D建模方法比其他3D步态方法在正常、背包携带和服装变化条件下克服视角变化方面更好。

URL

https://arxiv.org/abs/2308.07468

PDF

https://arxiv.org/pdf/2308.07468.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot