Muesli: Combining Improvements in Policy Optimization

2021-04-13 13:04:29

Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt

arXiv_AI

arXiv_AI Optimization Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Abstract (translated)

URL

https://arxiv.org/abs/2104.06159

PDF

https://arxiv.org/pdf/2104.06159.pdf