Paper Reading AI Learner

A Multi-Stage Adaptive Feature Fusion Neural Network for Multimodal Gait Recognition

2023-12-22 03:25:15
Shinan Zou, Jianbo Xiong, Chao Fan, Shiqi Yu, Jin Tang

Abstract

Gait recognition is a biometric technology that has received extensive attention. Most existing gait recognition algorithms are unimodal, and a few multimodal gait recognition algorithms perform multimodal fusion only once. None of these algorithms may fully exploit the complementary advantages of the multiple modalities. In this paper, by considering the temporal and spatial characteristics of gait data, we propose a multi-stage feature fusion strategy (MSFFS), which performs multimodal fusions at different stages in the feature extraction process. Also, we propose an adaptive feature fusion module (AFFM) that considers the semantic association between silhouettes and skeletons. The fusion process fuses different silhouette areas with their more related skeleton joints. Since visual appearance changes and time passage co-occur in a gait period, we propose a multiscale spatial-temporal feature extractor (MSSTFE) to learn the spatial-temporal linkage features thoroughly. Specifically, MSSTFE extracts and aggregates spatial-temporal linkages information at different spatial scales. Combining the strategy and modules mentioned above, we propose a multi-stage adaptive feature fusion (MSAFF) neural network, which shows state-of-the-art performance in many experiments on three datasets. Besides, MSAFF is equipped with feature dimensional pooling (FD Pooling), which can significantly reduce the dimension of the gait representations without hindering the accuracy. this https URL

Abstract (translated)

翻译:步态识别是一种生物识别技术,已受到广泛关注。现有的步态识别算法大多是单模态的,而一些多模态步态识别算法仅在特征提取过程中执行一次多模态融合。这些算法可能没有充分利用多个模态的互补优势。在本文中,我们通过考虑步态数据的时序和空间特性,提出了一个多级特征融合策略(MSFFS),在特征提取过程的不同阶段执行多模态融合。此外,我们还提出了一个自适应特征融合模块(AFFM),考虑了轮廓和骨架之间的语义关联。融合过程将不同轮廓区域的特征与它们更相关的骨架关节融合在一起。由于在步行周期中视觉外观变化和时间流逝是同时发生的,我们提出了一个多尺度空间时元特征提取器(MSSTFE),以深入学习空间时元特征。具体来说,MSSTFE在不同的空间尺度上提取和聚合空间时元连接信息。结合上述策略和模块,我们提出了一个多级自适应特征融合(MSAFF)神经网络,在多个数据集上的实验表现均达到了最先进的水平。此外,MSAFF配备了特征维度池化(FD Pooling)模块,这可以在不降低准确性的情况下显著减少步行表示的维度。

URL

https://arxiv.org/abs/2312.14410

PDF

https://arxiv.org/pdf/2312.14410.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot