Paper Reading AI Learner

Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning

2023-12-05 22:53:05
Pankayaraj Pathmanathan, Natalia Díaz-Rodríguez, Javier Del Ser

Abstract

In this work, we investigate the means of using curiosity on replay buffers to improve offline multi-task continual reinforcement learning when tasks, which are defined by the non-stationarity in the environment, are non labeled and not evenly exposed to the learner in time. In particular, we investigate the use of curiosity both as a tool for task boundary detection and as a priority metric when it comes to retaining old transition tuples, which we respectively use to propose two different buffers. Firstly, we propose a Hybrid Reservoir Buffer with Task Separation (HRBTS), where curiosity is used to detect task boundaries that are not known due to the task agnostic nature of the problem. Secondly, by using curiosity as a priority metric when it comes to retaining old transition tuples, a Hybrid Curious Buffer (HCB) is proposed. We ultimately show that these buffers, in conjunction with regular reinforcement learning algorithms, can be used to alleviate the catastrophic forgetting issue suffered by the state of the art on replay buffers when the agent's exposure to tasks is not equal along time. We evaluate catastrophic forgetting and the efficiency of our proposed buffers against the latest works such as the Hybrid Reservoir Buffer (HRB) and the Multi-Time Scale Replay Buffer (MTR) in three different continual reinforcement learning settings. Experiments were done on classical control tasks and Metaworld environment. Experiments show that our proposed replay buffers display better immunity to catastrophic forgetting compared to existing works in most of the settings.

Abstract (translated)

在这项研究中,我们研究了在重新播放缓冲区中使用好奇心来改善无标签环境和非均匀暴露于学习者时的离线多任务强化学习的方法。特别是,我们研究了好奇心在检测任务边界和保留旧转移元组方面的使用,我们分别使用这些元组来提出两种不同的缓冲器。首先,我们提出了一个带有任务分离的混合水库缓冲器(HRBTS),其中好奇心用于检测由于问题对任务无关性而无法确定的任务边界。其次,通过将好奇心用作保留旧转移元组的优先度度量,我们提出了一个混合好奇缓冲器(HCB)。最后,我们证明了这些缓冲器与标准的强化学习算法相结合可以缓解当前关于重新播放缓冲器状态的灾难性遗忘问题。我们评估了灾难性遗忘以及我们提出的缓冲器的效率,这些缓冲器在三个不同的连续强化学习环境中进行了实验。实验在经典控制任务和元世界环境中进行。实验结果表明,与现有作品相比,我们提出的缓冲器在大多数设置中具有更好的抗灾难性遗忘能力。

URL

https://arxiv.org/abs/2312.03177

PDF

https://arxiv.org/pdf/2312.03177.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot