Abstract
Skeleton-based action recognition is vital for comprehending human-centric videos and has applications in diverse domains. One of the challenges of skeleton-based action recognition is dealing with low-quality data, such as skeletons that have missing or inaccurate joints. This paper addresses the issue of enhancing action recognition using low-quality skeletons through a general knowledge distillation framework. The proposed framework employs a teacher-student model setup, where a teacher model trained on high-quality skeletons guides the learning of a student model that handles low-quality skeletons. To bridge the gap between heterogeneous high-quality and lowquality skeletons, we present a novel part-based skeleton matching strategy, which exploits shared body parts to facilitate local action pattern learning. An action-specific part matrix is developed to emphasize critical parts for different actions, enabling the student model to distill discriminative part-level knowledge. A novel part-level multi-sample contrastive loss achieves knowledge transfer from multiple high-quality skeletons to low-quality ones, which enables the proposed knowledge distillation framework to include training low-quality skeletons that lack corresponding high-quality matches. Comprehensive experiments conducted on the NTU-RGB+D, Penn Action, and SYSU 3D HOI datasets demonstrate the effectiveness of the proposed knowledge distillation framework.
Abstract (translated)
基于骨架的动作识别对于理解以人为中心的视频至关重要,并在各种领域具有应用价值。骨架动作识别的一个挑战是处理低质量数据,例如缺失或准确度不高的骨骼。本文通过一个通用的知识蒸馏框架来提高基于骨架的动作识别,该框架采用一个教师模型和一个学生模型。教师模型通过训练高质量骨骼来指导学习学生模型,学生模型处理低质量骨骼。为了弥合高质量和低质量骨骼之间的差距,我们提出了一个新颖的部分基于骨骼匹配策略,该策略利用共享身体部分来促进局部动作模式学习。为不同动作生成特定部分矩阵,强调关键部分以帮助学生模型蒸馏部分级别知识。一种新颖的部分级别多样本对比损失实现从多个高质量骨骼向低质量骨骼的知识传递,这使得所提出的知识蒸馏框架可以包括训练低质量骨骼,这些骨骼没有相应的高质量匹配。在NTU-RGB+D、Penn Action和SYSU 3D HOI数据集上进行全面的实验证明所提出的知识蒸馏框架的有效性。
URL
https://arxiv.org/abs/2404.18206