Paper Reading AI Learner

One for All: An End-to-End Compact Solution for Hand Gesture Recognition

2021-05-15 05:10:47
Monu Verma, Ayushi Gupta, santosh kumar Vipparthi

Abstract

The HGR is a quite challenging task as its performance is influenced by various aspects such as illumination variations, cluttered backgrounds, spontaneous capture, etc. The conventional CNN networks for HGR are following two stage pipeline to deal with the various challenges: complex signs, illumination variations, complex and cluttered backgrounds. The existing approaches needs expert expertise as well as auxiliary computation at stage 1 to remove the complexities from the input images. Therefore, in this paper, we proposes an novel end-to-end compact CNN framework: fine grained feature attentive network for hand gesture recognition (Fit-Hand) to solve the challenges as discussed above. The pipeline of the proposed architecture consists of two main units: FineFeat module and dilated convolutional (Conv) layer. The FineFeat module extracts fine grained feature maps by employing attention mechanism over multiscale receptive fields. The attention mechanism is introduced to capture effective features by enlarging the average behaviour of multi-scale responses. Moreover, dilated convolution provides global features of hand gestures through a larger receptive field. In addition, integrated layer is also utilized to combine the features of FineFeat module and dilated layer which enhances the discriminability of the network by capturing complementary context information of hand postures. The effectiveness of Fit- Hand is evaluated by using subject dependent (SD) and subject independent (SI) validation setup over seven benchmark datasets: MUGD-I, MUGD-II, MUGD-III, MUGD-IV, MUGD-V, Finger Spelling and OUHANDS, respectively. Furthermore, to investigate the deep insights of the proposed Fit-Hand framework, we performed ten ablation study.

Abstract (translated)

URL

https://arxiv.org/abs/2105.07143

PDF

https://arxiv.org/pdf/2105.07143.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot