Abstract
Trajectory optimization with learned dynamics models can often suffer from erroneous predictions of out-of-distribution trajectories. We propose to regularize trajectory optimization by means of a denoising autoencoder that is trained on the same trajectories as the dynamics model. We visually demonstrate the effectiveness of the regularization in gradient-based trajectory optimization for open-loop control of an industrial process. We compare with recent model-based reinforcement learning algorithms on a set of popular motor control tasks to demonstrate that the denoising regularization enables state-of-the-art sample-efficiency. We demonstrate the efficacy of the proposed method in regularizing both gradient-based and gradient-free trajectory optimization.
Abstract (translated)
利用所学动力学模型进行轨迹优化,常常会遇到对失配轨迹的错误预测。本文提出了一种在与动力学模型相同的轨迹上训练的去噪自动编码器来规范化轨迹优化。我们直观地证明了正则化在工业过程开环控制的基于梯度的轨迹优化中的有效性。我们比较了最新的基于模型的强化学习算法在一组流行的电机控制任务,以证明去噪正则化能够实现最先进的采样效率。我们证明了该方法对基于梯度和无梯度轨道优化的正则化的有效性。
URL
https://arxiv.org/abs/1903.11981