C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

2022-11-07 13:02:40

Alexis Jacq, Manu Orsini, Gabriel Dulac-Arnold, Olivier Pietquin, Matthieu Geist, Olivier Bachem

arXiv_AI

arXiv_AI Unsupervised Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Given a particular embodiment, we propose a novel method (C3PO) that learns policies able to achieve any arbitrary position and pose. Such a policy would allow for easier control, and would be re-useable as a key building block for downstream tasks. The method is two-fold: First, we introduce a novel exploration algorithm that optimizes for uniform coverage, is able to discover a set of achievable states, and investigates its abilities in attaining both high coverage, and hard-to-discover states; Second, we leverage this set of achievable states as training data for a universal goal-achievement policy, a goal-based SAC variant. We demonstrate the trained policy's performance in achieving a large number of novel states. Finally, we showcase the influence of massive unsupervised training of a goal-achievement policy with state-of-the-art pose-based control of the Hopper, Walker, Halfcheetah, Humanoid and Ant embodiments.

Abstract (translated)

URL

https://arxiv.org/abs/2211.03521

PDF

https://arxiv.org/pdf/2211.03521.pdf