Abstract
Skeleton-based action recognition has gained considerable traction thanks to its utilization of succinct and robust skeletal representations. Nonetheless, current methodologies often lean towards utilizing a solitary backbone to model skeleton modality, which can be limited by inherent flaws in the network backbone. To address this and fully leverage the complementary characteristics of various network architectures, we propose a novel Hybrid Dual-Branch Network (HDBN) for robust skeleton-based action recognition, which benefits from the graph convolutional network's proficiency in handling graph-structured data and the powerful modeling capabilities of Transformers for global information. In detail, our proposed HDBN is divided into two trunk branches: MixGCN and MixFormer. The two branches utilize GCNs and Transformers to model both 2D and 3D skeletal modalities respectively. Our proposed HDBN emerged as one of the top solutions in the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) of 2024 ICME Grand Challenge, achieving accuracies of 47.95% and 75.36% on two benchmarks of the UAV-Human dataset by outperforming most existing methods. Our code will be publicly available at: this https URL.
Abstract (translated)
基于骨架的动作识别因为利用了简洁且鲁棒的骨架表示而获得了相当大的关注。然而,当前的方法通常倾向于使用单一的骨架来建模骨架模式,这可能会受到网络骨架固有缺陷的限制。为了解决这个问题,并充分利用各种网络架构的互补特点,我们提出了一种新颖的混合双分支网络(HDBN),用于鲁棒骨架 based 动作识别,该网络从图卷积网络的擅长处理图形数据和Transformer的强大的建模能力中受益。具体而言,我们提出的 HDBN 分为两个主分支:MixGCN 和 MixFormer。这两个分支分别使用 GCN 和 Transformer 建模 2D 和 3D 骨架模式。我们提出的 HDBN 在 2024 ICME Grand Challenge 多模态视频推理和分析比赛中成为了一个一流解决方案,在两个 UAV-Human 数据集的基准上实现了准确度分别为 47.95% 和 75.36%,超过了大多数现有方法。我们的代码将在这个链接上公开发布:https://this URL。
URL
https://arxiv.org/abs/2404.15719