Paper Reading AI Learner

Deep Laplacian-based Options for Temporally-Extended Exploration

2023-01-26 15:45:39
Martin Klissarov, Marlos C. Machado

Abstract

Selecting exploratory actions that generate a rich stream of experience for better learning is a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem consists in selecting actions according to specific policies for an extended period of time, also known as options. A recent line of work to derive such exploratory options builds upon the eigenfunctions of the graph Laplacian. Importantly, until now these methods have been mostly limited to tabular domains where (1) the graph Laplacian matrix was either given or could be fully estimated, (2) performing eigendecomposition on this matrix was computationally tractable, and (3) value functions could be learned exactly. Additionally, these methods required a separate option discovery phase. These assumptions are fundamentally not scalable. In this paper we address these limitations and show how recent results for directly approximating the eigenfunctions of the Laplacian can be leveraged to truly scale up options-based exploration. To do so, we introduce a fully online deep RL algorithm for discovering Laplacian-based options and evaluate our approach on a variety of pixel-based tasks. We compare to several state-of-the-art exploration methods and show that our approach is effective, general, and especially promising in non-stationary settings.

Abstract (translated)

选择探索性行动,生成丰富的经验流以更好地学习是 reinforcement learning (RL) 中的 fundamental 挑战。解决这个问题的方法包括根据特定的政策选择行动一段时间,也被称为选项。最近的一项工作基于图洛伦兹多项式的eigen函数构建这些探索性选项。重要的是,这些方法目前主要局限于表格领域,其中(1)图洛伦兹矩阵要么给出,要么可以完全估计,(2)对矩阵进行eigendecomposition计算可计算,(3)价值函数可以精确学习。此外,这些方法需要单独的选项发现阶段。这些假设从根本上不可扩展。在本文中,我们解决这些问题并展示如何利用最近直接逼近洛伦兹多项式eigen函数的结果,真正扩展基于选项的探索。为此,我们介绍了一个完全在线的深度强化学习算法,用于发现洛伦兹多项式选项,并评估我们的方法在各种像素任务上的性能。我们与多个先进的探索方法进行比较,并表明我们的方法有效、通用,尤其是在非稳定 settings 中特别有前途。

URL

https://arxiv.org/abs/2301.11181

PDF

https://arxiv.org/pdf/2301.11181.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot