Paper Reading AI Learner

Learning 3D Navigation Protocols on Touch Interfaces with Cooperative Multi-Agent Reinforcement Learning

2019-04-16 16:33:04
Quentin Debard, Jilles Steeve Dibangoye, Stéphane Canu, Christian Wolf

Abstract

Using touch devices to navigate in virtual 3D environments such as computer assisted design (CAD) models or geographical information systems (GIS) is inherently difficult for humans, as the 3D operations have to be performed by the user on a 2D touch surface. This ill-posed problem is classically solved with a fixed and handcrafted interaction protocol, which must be learned by the user. We propose to automatically learn a new interaction protocol allowing to map a 2D user input to 3D actions in virtual environments using reinforcement learning (RL). A fundamental problem of RL methods is the vast amount of interactions often required, which are difficult to come by when humans are involved. To overcome this limitation, we make use of two collaborative agents. The first agent models the human by learning to perform the 2D finger trajectories. The second agent acts as the interaction protocol, interpreting and translating to 3D operations the 2D finger trajectories from the first agent. We restrict the learned 2D trajectories to be similar to a training set of collected human gestures by first performing state representation learning, prior to reinforcement learning. This state representation learning is addressed by projecting the gestures into a latent space learned by a variational auto encoder (VAE).

Abstract (translated)

使用触摸设备在虚拟3D环境中导航,如计算机辅助设计(CAD)模型或地理信息系统(GIS),对人类来说是固有的困难,因为3D操作必须由用户在二维触摸表面上执行。这个不适定的问题经典地通过一个固定的手工交互协议来解决,而这个协议必须由用户学习。我们建议自动学习一种新的交互协议,允许使用增强学习(RL)将二维用户输入映射到虚拟环境中的三维动作。RL方法的一个基本问题是经常需要大量的交互作用,当涉及到人类时很难实现。为了克服这个限制,我们使用了两个协作代理。第一个代理通过学习执行二维手指轨迹来模拟人类。第二个代理充当交互协议,解释并转换为3D操作,二维手指轨迹从第一个代理。在强化学习之前,我们通过首先执行状态表示学习,将学习的二维轨迹限制为类似于一组收集到的人体手势的训练集。这种状态表示学习是通过将手势投射到可变自动编码器(VAE)学习的潜在空间来解决的。

URL

https://arxiv.org/abs/1904.07802

PDF

https://arxiv.org/pdf/1904.07802.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot