Abstract
In the context of imitation learning applied to dexterous robotic hands, the high complexity of the systems makes learning complex manipulation tasks challenging. However, the numerous datasets depicting human hands in various different tasks could provide us with better knowledge regarding human hand motion. We propose a method to leverage multiple large-scale task-agnostic datasets to obtain latent representations that effectively encode motion subtrajectories that we included in a transformer-based behavior cloning method. Our results demonstrate that employing latent representations yields enhanced performance compared to conventional behavior cloning methods, particularly regarding resilience to errors and noise in perception and proprioception. Furthermore, the proposed approach solely relies on human demonstrations, eliminating the need for teleoperation and, therefore, accelerating the data acquisition process. Accurate inverse kinematics for fingertip retargeting ensures precise transfer from human hand data to the robot, facilitating effective learning and deployment of manipulation policies. Finally, the trained policies have been successfully transferred to a real-world 23Dof robotic system.
Abstract (translated)
在将模仿学习应用于灵巧机器人手的应用中,系统的复杂性使得学习复杂的操作任务具有挑战性。然而,描述人类在不同任务中操作的丰富数据集可以为我们在运动子轨迹方面提供更好的知识。我们提出了一种利用多个大型、任务无关的数据集的方法,以获得有效的表示我们包括在基于Transformer的行为复制方法中的运动子轨迹的潜在表示。我们的结果表明,使用潜在表示能够提高与传统行为复制方法的性能,特别是关于感知和本体知觉中的错误和噪声的鲁棒性。此外,所提出的方法仅依赖于人类演示,因此消除了遥控的需求,从而加速了数据收集过程。准确的手指重新定位的逆运动学确保了从人类手数据到机器人的精确传递,促进了有效的学习和部署操作策略。最后,已经训练好的策略已经被成功地应用于一个23Dof的实物机器人系统。
URL
https://arxiv.org/abs/2404.16483