Abstract
We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-based data generation to obtain complex behaviors from egocentric vision which can be successfully transferred to physical robots using low-cost sensors. To achieve adequate visual realism, our simulation combines rigid-body physics with learned, realistic rendering via multiple Neural Radiance Fields (NeRFs). We combine teacher-based multi-agent RL and cross-experiment data reuse to enable the discovery of sophisticated soccer strategies. We analyze active-perception behaviors including object tracking and ball seeking that emerge when simply optimizing perception-agnostic soccer play. The agents display equivalent levels of performance and agility as policies with access to privileged, ground-truth state. To our knowledge, this paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer, mapping raw pixel observations to joint-level actions, that can be deployed in the real world. Videos of the game-play and analyses can be seen on our website this https URL .
Abstract (translated)
我们将多智能体深度强化学习(RL)应用于训练具有完全车载计算和感知能力的端到端机器人足球策略,通过采用 ego 中心式 RGB 视觉。这个设置反映了现实世界机器人领域许多挑战,包括积极感知、灵活的全身体控制和动态、部分不可观测的多智能体领域的长距离规划。我们依赖于大规模、基于模拟的数据生成来获得自适应的 behaviors,这些 behaviors 可以成功地传输到物理机器人,利用低成本传感器。为了实现适当的视觉现实,我们的模拟结合了刚体物理和通过多个 Neural Radiance Fields (NeRFs) 学习到的逼真的渲染。我们将基于教师的多智能体 RL 和跨实验数据复用来探索复杂的足球策略。我们分析了在仅优化感知无关足球比赛时出现的积极感知行为,包括物体跟踪和寻找球。代理显示与具有特权、地面真实状态访问权限的政策具有同等的表现和敏捷性。据我们所知,本文是首次将端到端训练多智能体机器人足球的实践,将原始像素观察结果映射到关节级别动作,可以在现实世界中部署。游戏的视频和分析可以在我们的网站 https:// this URL 上查看。
URL
https://arxiv.org/abs/2405.02425