Paper Reading AI Learner

Egocentric Gesture Recognition for Head-Mounted AR devices

2018-08-16 09:00:56
Tejo Chalasani, Jan Ondrej, Aljosa Smolic

Abstract

Natural interaction with virtual objects in AR/VR environments makes for a smooth user experience. Gestures are a natural extension from real world to augmented space to achieve these interactions. Finding discriminating spatio-temporal features relevant to gestures and hands in ego-view is the primary challenge for recognising egocentric gestures. In this work we propose a data driven end-to-end deep learning approach to address the problem of egocentric gesture recognition, which combines an ego-hand encoder network to find ego-hand features, and a recurrent neural network to discern temporally discriminating features. Since deep learning networks are data intensive, we propose a novel data augmentation technique using green screen capture to alleviate the problem of ground truth annotation. In addition we publish a dataset of 10 gestures performed in a natural fashion in front of a green screen for training and the same 10 gestures performed in different natural scenes without green screen for validation. We also present the results of our network's performance in comparison to the state-of-the-art using the AirGest dataset

Abstract (translated)

在AR / VR环境中与虚拟对象的自然交互可以提供流畅的用户体验。手势是从现实世界到增强空间的自然延伸,以实现这些相互作用。在自我视图中找到与手势和手相关的区别时空特征是识别自我中心手势的主要挑战。在这项工作中,我们提出了一种数据驱动的端到端深度学习方法,以解决以自我为中心的手势识别问题,该方法结合了自我编码器网络以找到自我特征,以及一个复现神经网络来识别时间上区分的特征。由于深度学习网络是数据密集型的,我们提出了一种新的数据增强技术,使用绿屏捕获来缓解地面实况注释问题。此外,我们在绿色屏幕前面以自然方式发布了10个手势的数据集,用于训练,并且在不同的自然场景中执行相同的10个手势,没有绿色屏幕进行验证。与使用AirGest数据集的最新技术相比,我们还展示了网络性能的结果

URL

https://arxiv.org/abs/1808.05380

PDF

https://arxiv.org/pdf/1808.05380.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot