Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness

Abstract
Abstract (translated)
URL
PDF

Abstract

Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition, leveraging their ability to unravel the complex dynamics of human joint topology through the graph's adjacency matrix. However, an inherent flaw has come to light in these cutting-edge models: they tend to optimize the adjacency matrix jointly with the model weights. This process, while seemingly efficient, causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map. As a remedy, we propose a threefold strategy: (1) We forge an innovative pathway that encodes bone connectivity by harnessing the power of graph distances. This approach preserves the vital topological nuances often lost in conventional GCNs. (2) We highlight an oft-overlooked feature - the temporal mean of a skeletal sequence, which, despite its modest guise, carries highly action-specific information. (3) Our investigation revealed strong variations in joint-to-joint relationships across different actions. This finding exposes the limitations of a single adjacency matrix in capturing the variations of relational configurations emblematic of human movement, which we remedy by proposing an efficient refinement to Graph Convolutions (GC) - the BlockGC. This evolution slashes parameters by a substantial margin (above 40%), while elevating performance beyond original GCNs. Our full model, the BlockGCN, establishes new standards in skeleton-based action recognition for small model sizes. Its high accuracy, notably on the large-scale NTU RGB+D 120 dataset, stand as compelling proof of the efficacy of BlockGCN. The source code and model can be found at this https URL.

Abstract (translated)

Graph Convolutional Networks (GCNs) 已经长期定义了基于骨骼的行动识别的最新技术，通过利用 graph 的相邻矩阵，他们能够解开人类关节拓扑的复杂的动态。然而，这些前沿模型中存在的一个固有缺陷已经浮出水面：他们倾向于同时优化相邻矩阵和模型权重。这个进程虽然看似高效，但会导致骨连接数据逐渐退化，最终导致一个模型对它所要映射的拓扑毫不在意。作为一种补救措施，我们提出了三个策略： (1) forge 一个创新的路径，利用 graph 距离的力量，编码骨连接。这种方法保留了传统 GCN 中的重要拓扑细节，通常丢失在常规 GCN 中。 (2) 突出一个常被忽略的特征 - 骨骼序列的时间平均值，尽管其貌不扬，但携带高度行动特异性的信息。 (3) 我们的研究揭示了不同行动中 joint-to-joint 关系的强烈差异。这个发现揭示了单相邻矩阵在捕捉人类运动关系的象征性变异方面的局限性，我们因此提出了一个高效的改进方案 - 块GC。这个进化减少了参数的数量(超过 40%)，并超越了原始 GCN 的性能。我们的完整模型 - 块GCN，在小型模型大小下建立了基于骨骼的行动识别的新标准。它的高准确性，特别是在大型 NTU RGB+D 120 数据集上的表现，成为块GCN 有效性的强有力的证明。源代码和模型可以在 this https URL 中找到。

URL

https://arxiv.org/abs/2305.11468

PDF

https://arxiv.org/pdf/2305.11468.pdf