Abstract
Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective interventions during policy rollouts. However, collecting a sufficient amount of interventions to cover the distribution of policy mistakes can be burdensome for human operators. We propose IntervenGen (I-Gen), a novel data generation system that can autonomously produce a large set of corrective interventions with rich coverage of the state space from a small number of human interventions. We apply I-Gen to 4 simulated environments and 1 physical environment with object pose estimation error and show that it can increase policy robustness by up to 39x with only 10 human interventions. Videos and more results are available at this https URL.
Abstract (translated)
模仿学习是一种训练机器人控制策略的有前途的范式,但这些策略可能由于评估时与训练数据中的条件不同而受到分布漂移的影响。提高策略对分布漂移的鲁棒性的一个受欢迎的方法是交互式模仿学习(即DAgger及其变体),其中人类操作员在策略部署过程中提供纠正干预。然而,收集到足够的干预以覆盖策略错误的分布可能对人类操作员来说具有负担。我们提出IntervenGen(I-Gen),一种新数据生成系统,可以自主生成大量具有丰富覆盖状态空间小数个人类干预的纠正干预。我们将I-Gen应用于4个模拟环境和1个物理环境,对象姿态估计误差,并表明它可以通过仅使用10个人类干预来增加策略鲁棒性高达39倍。视频和其他结果可在此链接查看。
URL
https://arxiv.org/abs/2405.01472