Paper Reading AI Learner

Exploration via Flow-Based Intrinsic Rewards

2019-05-24 07:32:30
Hsuan-Kung Yang, Po-Han Chiang, Min-Fong Hong, Chun-Yi Lee

Abstract

Exploration bonuses derived from the novelty of observations in an environment have become a popular approach to motivate exploration for reinforcement learning (RL) agents in the past few years. Recent methods such as curiosity-driven exploration usually estimate the novelty of new observations by the prediction errors of their system dynamics models. In this paper, we introduce the concept of optical flow estimation from the field of computer vision to the RL domain and utilize the errors from optical flow estimation to evaluate the novelty of new observations. We introduce a flow-based intrinsic curiosity module (FICM) capable of learning the motion features and understanding the observations in a more comprehensive and efficient fashion. We evaluate our method and compare it with a number of baselines on several benchmark environments, including Atari games, Super Mario Bros., and ViZDoom. Our results show that the proposed method is superior to the baselines in certain environments, especially for those featuring sophisticated moving patterns or with high-dimensional observation spaces. We further analyze the hyper-parameters used in the training phase and discuss our insights into them.

Abstract (translated)

在过去的几年中,由于环境中观察结果的新颖性而获得的探索奖励已成为激励强化学习(RL)代理探索的一种流行方法。最近的方法,如好奇心驱动的探索,通常通过系统动力学模型的预测误差来估计新观测的新颖性。本文将光流估计的概念从计算机视觉领域引入到RL领域,并利用光流估计的误差来评价新观测的新颖性。我们介绍了一个基于流的内在好奇心模块(FICM),它能够以更全面和更有效的方式学习运动特征和理解观察结果。我们评估了我们的方法,并将其与几个基准环境上的基线进行比较,包括Atari游戏、超级马里奥兄弟和Vizdoom。研究结果表明,该方法在一定环境下优于基线法,特别是在运动模式复杂或观测空间较大的环境下。我们进一步分析了训练阶段使用的超参数,并讨论了我们对这些参数的看法。

URL

https://arxiv.org/abs/1905.10071

PDF

https://arxiv.org/pdf/1905.10071.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot