Paper Reading AI Learner

Skeleton-based Human Action Recognition via Convolutional Neural Networks

2023-01-31 01:26:17
Ayman Ali, Ekkasit Pinyoanuntapong, Pu Wang, Mohsen Dorodchi

Abstract

Recently, there has been a remarkable increase in the interest towards skeleton-based action recognition within the research community, owing to its various advantageous features, including computational efficiency, representative features, and illumination invariance. Despite this, researchers continue to explore and investigate the most optimal way to represent human actions through skeleton representation and the extracted features. As a result, the growth and availability of human action recognition datasets have risen substantially. In addition, deep learning-based algorithms have gained widespread popularity due to the remarkable advancements in various computer vision tasks. Most state-of-the-art contributions in skeleton-based action recognition incorporate a Graph Neural Network (GCN) architecture for representing the human body and extracting features. Our research demonstrates that Convolutional Neural Networks (CNNs) can attain comparable results to GCN, provided that the proper training techniques, augmentations, and optimizers are applied. Our approach has been rigorously validated, and we have achieved a score of 95% on the NTU-60 dataset

Abstract (translated)

近年来,由于 skeleton-based 行动识别具有各种优势特征,包括计算效率、代表性特征和照明适应力等,研究 community 中对 skeleton-based 行动识别的兴趣显著增加。尽管如此,研究人员仍在探索和研究通过 skeleton 表示和提取特征最优化表示人类行动的方法。因此,人类行动识别数据集的增长和可用性大大提高了。此外,深度学习算法因各种计算机视觉任务的显著进展而广受欢迎。 skeleton-based 行动识别中最先进的贡献大多数都采用了 Graph 神经网络(GCN)架构,用于表示人体和提取特征。我们的研究结果表明,卷积神经网络(CNN)可以与 GCN 实现类似的结果,只要采用适当的训练技术、增强器和优化器。我们的研究方法已经严格验证,我们在 NTU-60 数据集上取得了 95% 的得分。

URL

https://arxiv.org/abs/2301.13360

PDF

https://arxiv.org/pdf/2301.13360.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot