Paper Reading AI Learner

LIGAR: Lightweight General-purpose Action Recognition

2021-08-30 12:23:43
Evgeny Izutov

Abstract

Growing amount of different practical tasks in a video understanding problem has addressed the great challenge aiming to design an universal solution, which should be available for broad masses and suitable for the demanding edge-oriented inference. In this paper we are focused on designing a network architecture and a training pipeline to tackle the mentioned challenges. Our architecture takes the best from the previous ones and brings the ability to be successful not only in appearance-based action recognition tasks but in motion-based problems too. Furthermore, the induced label noise problem is formulated and Adaptive Clip Selection (ACS) framework is proposed to deal with it. Together it makes the LIGAR framework the general-purpose action recognition solution. We also have reported the extensive analysis on the general and gesture datasets to show the excellent trade-off between the performance and the accuracy in comparison to the state-of-the-art solutions. Training code is available at: this https URL. For the efficient edge-oriented inference all trained models can be exported into the OpenVINO format.

Abstract (translated)

URL

https://arxiv.org/abs/2108.13153

PDF

https://arxiv.org/pdf/2108.13153.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot