Paper Reading AI Learner

Few-Shot Image-to-Semantics Translation for Policy Transfer in Reinforcement Learning

2023-01-31 00:28:18
Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto

Abstract

We investigate policy transfer using image-to-semantics translation to mitigate learning difficulties in vision-based robotics control agents. This problem assumes two environments: a simulator environment with semantics, that is, low-dimensional and essential information, as the state space, and a real-world environment with images as the state space. By learning mapping from images to semantics, we can transfer a policy, pre-trained in the simulator, to the real world, thereby eliminating real-world on-policy agent interactions to learn, which are costly and risky. In addition, using image-to-semantics mapping is advantageous in terms of the computational efficiency to train the policy and the interpretability of the obtained policy over other types of sim-to-real transfer strategies. To tackle the main difficulty in learning image-to-semantics mapping, namely the human annotation cost for producing a training dataset, we propose two techniques: pair augmentation with the transition function in the simulator environment and active learning. We observed a reduction in the annotation cost without a decline in the performance of the transfer, and the proposed approach outperformed the existing approach without annotation.

Abstract (translated)

我们研究使用图像到语义翻译来解决基于视觉机器人控制 agents 的学习困难。这个问题假设了两个环境:一个语义环境,即低维度和关键信息,作为状态空间,另一个现实世界环境,以图像作为状态空间。通过学习从图像到语义的映射,我们可以将已经在模拟器中训练的政策转移到现实世界,从而消除现实世界的政策执行与学习之间的实际执行交互,这些交互昂贵且风险高。此外,使用图像到语义映射在训练政策效率和获得政策的解释性方面优于其他类型的模拟器到现实的传输策略。为了应对学习图像到语义映射的主要困难,即产生训练数据的人类标注成本,我们提出了两种方法:在模拟器环境中的配对增强和主动学习。我们观察到标注成本下降了,但传输表现并没有下降,而 proposed 方法超越了没有标注的方法。

URL

https://arxiv.org/abs/2301.13343

PDF

https://arxiv.org/pdf/2301.13343.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot