Model-free Reinforcement Learning for Robust Locomotion Using Trajectory Optimization for Exploration

2021-07-14 12:07:19

Miroslav Bogdanovic, Majid Khadiv, Ludovic Righetti

arXiv_RO

arXiv_RO Reinforcement_Learning Optimization

Abstract
Abstract (translated)
URL
PDF

Abstract

In this work we present a general, two-stage reinforcement learning approach for going from a single demonstration trajectory to a robust policy that can be deployed on hardware without any additional training. The demonstration is used in the first stage as a starting point to facilitate initial exploration. In the second stage, the relevant task reward is optimized directly and a policy robust to environment uncertainties is computed. We demonstrate and examine in detail performance and robustness of our approach on highly dynamic hopping and bounding tasks on a real quadruped robot.

Abstract (translated)

URL

https://arxiv.org/abs/2107.06629

PDF

https://arxiv.org/pdf/2107.06629.pdf