Paper Reading AI Learner

VR-Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control

2018-06-25 04:26:35
Jingwei Zhang, Lei Tai, Yufeng Xiong, Peng Yun, Ming Liu, Joschka Boedecker, Wolfram Burgard

Abstract

In this paper, we deal with the reality gap from a novel perspective, targeting transferring Deep Reinforcement Learning (DRL) policies learned in simulated environments to the real-world domain for visual control tasks. Instead of adopting the common solutions to the problem by increasing the visual fidelity of synthetic images output from simulators during the training phase, we seek to tackle the problem by translating the real-world image streams back to the synthetic domain during the deployment phase, to make the robot feel at home. We propose this as a lightweight, flexible, and efficient solution for visual control, as 1) no extra transfer steps are required during the expensive training of DRL agents in simulation; 2) the trained DRL agents will not be constrained to being deployable in only one specific real-world environment; 3) the policy training and the transfer operations are decoupled, and can be conducted in parallel. Besides this, we propose a simple yet effective shift loss to constrain the consistency between subsequent frames, which is important for consistent policy outputs. We validate the shift loss for artistic style transfer for videos and domain adaptation, and validate our visual control approach in both indoor and outdoor robotics experiments. A video of our results is available at: https://goo.gl/P76TTo.

Abstract (translated)

在本文中,我们以新颖的视角处理现实差距,将模拟环境中学习的深度强化学习(DRL)策略转移到视觉控制任务的真实领域。我们不是在训练阶段通过增加仿真器输出的合成图像的视觉保真度来采用常见的解决方案,而是在部署阶段通过将真实世界的图像流转换回合成域来解决该问题,使机器人感到宾至如归。我们提出这是一个轻量级,灵活和高效的视觉控制解决方案,因为1)在模拟DRL代理的昂贵培训期间,不需要额外的转移步骤; 2)训练有素的DRL代理不会被限制为只能在一个特定的现实世界环境中部署; 3)政策培训和转移操作是分开的,可以并行进行。除此之外,我们提出了一个简单而有效的转移损失来约束后续帧之间的一致性,这对于一致的策略输出非常重要。我们验证了视频和领域适应的艺术风格转移的转移损失,并在室内和室外机器人实验中验证了我们的视觉控制方法。我们的结果视频可在以下网址获得:https://goo.gl/P76TTo。

URL

https://arxiv.org/abs/1802.00265

PDF

https://arxiv.org/pdf/1802.00265.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot