Paper Reading AI Learner

Predictive Experience Replay for Continual Visual Control and Forecasting

2023-03-12 05:08:03
Wendong Zhang, Geng Chen, Xiangming Zhu, Siyu Gao, Yunbo Wang, Xiaokang Yang

Abstract

Learning physical dynamics in a series of non-stationary environments is a challenging but essential task for model-based reinforcement learning (MBRL) with visual inputs. It requires the agent to consistently adapt to novel tasks without forgetting previous knowledge. In this paper, we present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. The key assumption is that an ideal world model can provide a non-forgetting environment simulator, which enables the agent to optimize the policy in a multi-task learning manner based on the imagined trajectories from the world model. To this end, we first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting, which we call predictive experience replay. Finally, we extend these methods to continual RL and further address the value estimation problems with the exploratory-conservative behavior learning approach. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks. It is also shown to effectively alleviate the forgetting of spatiotemporal dynamics in video prediction datasets with evolving domains.

Abstract (translated)

在学习一系列非稳定环境的物理动态特性是一项具有挑战性但是必不可少的任务,这对于使用视觉输入的模型驱动强化学习(MBRL)来说尤其如此。该任务要求代理持续适应新任务而不会忘记先前的知识。在本文中,我们提出了一种新的视觉动态建模方法,并探索了它在视觉控制和预测方面的效力。关键假设是理想的世界模型可以提供一种不会遗忘的环境模拟器,从而使代理能够在世界模型的想象轨迹上优化政策,以多任务学习的方式。为此,我们提出了一种混合世界模型,通过学习任务特定的动态先验分布,使用高斯混合模型来学习,然后引入了一种新的训练策略,以克服灾难性的遗忘,我们称之为预测经验回放。最后,我们将这些方法扩展到持续强化学习,并进一步解决了探索性保守行为学习方法所带来的价值估计问题。我们的模型在DeepMind控制和Meta-World基准任务中与现有的视觉强化学习和视觉控制算法的盲目组合相比表现出卓越的性能。它还表明,能够在具有进化域的视频预测数据集上有效地减轻忘记时序动态的问题。

URL

https://arxiv.org/abs/2303.06572

PDF

https://arxiv.org/pdf/2303.06572.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot