Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

A successful tactic that is followed by the scientific community for advancing AI is to treat games as problems, which has been proven to lead to various breakthroughs. We adapt this strategy in order to study Rocket League, a widely popular but rather under-explored 3D multiplayer video game with a distinct physics engine and complex dynamics that pose a significant challenge in developing efficient and high-performance game-playing agents. In this paper, we present Lucy-SKG, a Reinforcement Learning-based model that learned how to play Rocket League in a sample-efficient manner, outperforming by a notable margin the two highest-ranking bots in this game, namely Necto (2022 bot champion) and its successor Nexto, thus becoming a state-of-the-art agent. Our contributions include: a) the development of a reward analysis and visualization library, b) novel parameterizable reward shape functions that capture the utility of complex reward types via our proposed Kinesthetic Reward Combination (KRC) technique, and c) design of auxiliary neural architectures for training on reward prediction and state representation tasks in an on-policy fashion for enhanced efficiency in learning speed and performance. By performing thorough ablation studies for each component of Lucy-SKG, we showed their independent effectiveness in overall performance. In doing so, we demonstrate the prospects and challenges of using sample-efficient Reinforcement Learning techniques for controlling complex dynamical systems under competitive team-based multiplayer conditions.

Abstract (translated)

科学界采取的成功策略是将游戏视为问题，这一策略已被证明可以带来各种突破。我们适应这一策略是为了研究备受欢迎但被忽视的3D多人互动视频游戏《 Rocket League》，这个游戏有一个独特的物理引擎和复杂的动态系统，在开发高效、高性能的游戏扮演代理方面面临着巨大的挑战。在本文中，我们介绍了Lucy-SKG，这是一种基于强化学习模型的代理，通过高效的样本学习，以显著优势超越了这个游戏中排名最高的两个机器人，分别是Necto(2022年机器人冠军)和其后继者Nexto，因此成为最先进的代理。我们的贡献包括：a)开发奖励分析和可视化库；b) novel可参数化的奖励形状函数，通过我们提出的触觉奖励组合(KRC)技术捕捉复杂的奖励类型的价值；c)设计辅助神经网络架构，以在政策模式下训练奖励预测和状态表示任务，以提高学习速度和表现效率。通过进行每个组件的全面 ablation研究，我们展示了它们在整体表现中的独立有效性。在此过程中，我们展示了使用高效的样本学习技术控制复杂动态系统在竞争团队间多人互动条件下的前景和挑战。

URL

https://arxiv.org/abs/2305.15801

PDF

https://arxiv.org/pdf/2305.15801.pdf