Abstract
Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.
Abstract (translated)
样本效率在将强化学习(RL)应用于现实世界任务中仍然是一个关键挑战。虽然最近的应用程序在提高样本效率方面取得了显著的进步,但没有一个算法在多样领域上实现了 consistently优越的性能。在本文中,我们引入了EfficientZero V2,一个专为样本效率RL算法设计的通用框架。我们通过一系列改进扩大了EfficientZero V2在多个领域(包括连续和离散动作以及视觉和低维输入)的表现。通过一系列改进,我们在多样任务上显著超过了当前最先进的(SOTA)水平。在有限的数据设置下,EfficientZero V2在多样任务中的表现优于当前的(SOTA)。与当前主导算法DreamerV3相比,EfficientZero V2取得了显著的进展,在66个评估任务中,有50个任务在Atari 100k、Proprio Control和Vision Control等多样基准上实现了优越的性能。
URL
https://arxiv.org/abs/2403.00564