Abstract
The existing Motion Imitation models typically require expert data obtained through MoCap devices, but the vast amount of training data needed is difficult to acquire, necessitating substantial investments of financial resources, manpower, and time. This project combines 3D human pose estimation with reinforcement learning, proposing a novel model that simplifies Motion Imitation into a prediction problem of joint angle values in reinforcement learning. This significantly reduces the reliance on vast amounts of training data, enabling the agent to learn an imitation policy from just a few seconds of video and exhibit strong generalization capabilities. It can quickly apply the learned policy to imitate human arm motions in unfamiliar videos. The model first extracts skeletal motions of human arms from a given video using 3D human pose estimation. These extracted arm motions are then morphologically retargeted onto a robotic manipulator. Subsequently, the retargeted motions are used to generate reference motions. Finally, these reference motions are used to formulate a reinforcement learning problem, enabling the agent to learn a policy for imitating human arm motions. This project excels at imitation tasks and demonstrates robust transferability, accurately imitating human arm motions from other unfamiliar videos. This project provides a lightweight, convenient, efficient, and accurate Motion Imitation model. While simplifying the complex process of Motion Imitation, it achieves notably outstanding performance.
Abstract (translated)
现有的运动模仿模型通常需要通过MoCap设备获得的专家数据,但需要的训练数据量巨大,很难获得,这需要大量的时间和财务资源。本项目将3D人体姿态估计与强化学习相结合,提出了一种将运动模仿简化为强化学习中关节角度预测问题的全新模型。这使得对大量训练数据的依赖程度显著降低,使得代理可以从几秒钟的视频中仅学习几个关节的模仿策略,并表现出强大的泛化能力。它能够快速将学习到的策略应用于不熟悉的视频中的模仿人类手臂运动。首先,使用3D人体姿态估计从给定的视频中提取人体的骨骼运动。然后,这些提取的运动动作被拓扑重构到机器人操作器上。接下来,重构的运动动作用于生成参考动作。最后,这些参考动作被用于构成强化学习问题,使得代理能够学习模仿人类手臂运动的策略。本项目在模仿任务中表现优异,并展示了稳健的泛化能力,准确地将不熟悉的视频中的人类手臂运动模仿出来。本项目提供了一个轻量、方便、高效和准确的动态模仿模型。尽管简化了运动模仿的复杂过程,但取得了显著的优异性能。
URL
https://arxiv.org/abs/2405.01284