Paper Reading AI Learner

EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

2024-03-01 14:42:25
Shengjie Wang, Shaohuai Liu, Weirui Ye, Jiacheng You, Yang Gao

Abstract

Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.

Abstract (translated)

样本效率在将强化学习(RL)应用于现实世界任务中仍然是一个关键挑战。虽然最近的应用程序在提高样本效率方面取得了显著的进步,但没有一个算法在多样领域上实现了 consistently优越的性能。在本文中,我们引入了EfficientZero V2,一个专为样本效率RL算法设计的通用框架。我们通过一系列改进扩大了EfficientZero V2在多个领域(包括连续和离散动作以及视觉和低维输入)的表现。通过一系列改进,我们在多样任务上显著超过了当前最先进的(SOTA)水平。在有限的数据设置下,EfficientZero V2在多样任务中的表现优于当前的(SOTA)。与当前主导算法DreamerV3相比,EfficientZero V2取得了显著的进展,在66个评估任务中,有50个任务在Atari 100k、Proprio Control和Vision Control等多样基准上实现了优越的性能。

URL

https://arxiv.org/abs/2403.00564

PDF

https://arxiv.org/pdf/2403.00564.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot