Paper Reading AI Learner

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

2023-05-19 06:40:12
Yuxuan Zhou, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie, Margret Keuper

Abstract

Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition, leveraging their ability to unravel the complex dynamics of human joint topology through the graph's adjacency matrix. However, an inherent flaw has come to light in these cutting-edge models: they tend to optimize the adjacency matrix jointly with the model weights. This process, while seemingly efficient, causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map. As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances. This approach preserves the vital topological nuances often lost in conventional GCNs. (2) We highlight an oft-overlooked feature - the temporal mean of a skeletal sequence, which, despite its modest guise, carries highly action-specific information. (3) Our investigation revealed strong variations in joint-to-joint relationships across different actions. This finding exposes the limitations of a single adjacency matrix in capturing the variations of relational configurations emblematic of human movement, which we remedy by proposing an efficient refinement to Graph Convolutions (GC) - the BlockGC. This evolution slashes parameters by a substantial margin (above 40%), while elevating performance beyond original GCNs. Our full model, the BlockGCN, establishes new standards in skeleton-based action recognition for small model sizes. Its high accuracy, notably on the large-scale NTU RGB+D 120 dataset, stand as compelling proof of the efficacy of BlockGCN. The source code and model can be found at this https URL.

Abstract (translated)

Graph Convolutional Networks (GCNs) 已经长期定义了基于骨骼的行动识别的最新技术,通过利用 graph 的相邻矩阵,他们能够解开人类关节拓扑的复杂的动态。然而,这些前沿模型中存在的一个固有缺陷已经浮出水面:他们倾向于同时优化相邻矩阵和模型权重。这个进程虽然看似高效,但会导致骨连接数据逐渐退化,最终导致一个模型对它所要映射的拓扑毫不在意。作为一种补救措施,我们提出了三个策略: (1) forge 一个创新的路径,利用 graph 距离的力量,编码骨连接。这种方法保留了传统 GCN 中的重要拓扑细节,通常丢失在常规 GCN 中。 (2) 突出一个常被忽略的特征 - 骨骼序列的时间平均值,尽管其貌不扬,但携带高度行动特异性的信息。 (3) 我们的研究揭示了不同行动中 joint-to-joint 关系的强烈差异。这个发现揭示了单相邻矩阵在捕捉人类运动关系的象征性变异方面的局限性,我们因此提出了一个高效的改进方案 - 块GC。这个进化减少了参数的数量(超过 40%),并超越了原始 GCN 的性能。我们的完整模型 - 块GCN,在小型模型大小下建立了基于骨骼的行动识别的新标准。它的高准确性,特别是在大型 NTU RGB+D 120 数据集上的表现,成为块GCN 有效性的强有力的证明。源代码和模型可以在 this https URL 中找到。

URL

https://arxiv.org/abs/2305.11468

PDF

https://arxiv.org/pdf/2305.11468.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot