Abstract
In multi-robot systems, achieving coordinated missions remains a significant challenge due to the coupled nature of coordination behaviors and the lack of global information for individual robots. To mitigate these challenges, this paper introduces a novel approach, Bi-level Coordination Learning (Bi-CL), that leverages a bi-level optimization structure within a centralized training and decentralized execution paradigm. Our bi-level reformulation decomposes the original problem into a reinforcement learning level with reduced action space, and an imitation learning level that gains demonstrations from a global optimizer. Both levels contribute to improved learning efficiency and scalability. We note that robots' incomplete information leads to mismatches between the two levels of learning models. To address this, Bi-CL further integrates an alignment penalty mechanism, aiming to minimize the discrepancy between the two levels without degrading their training efficiency. We introduce a running example to conceptualize the problem formulation and apply Bi-CL to two variations of this example: route-based and graph-based scenarios. Simulation results demonstrate that Bi-CL can learn more efficiently and achieve comparable performance with traditional multi-agent reinforcement learning baselines for multi-robot coordination.
Abstract (translated)
在多机器人系统中,实现协调任务仍然是一个重要的挑战,因为协调行为是相互耦合的,并且每个机器人的全局信息缺乏。为了减轻这些挑战,本文引入了一种新颖的方法——双层协调学习(Bi-CL),该方法利用了集中训练和分布式执行范式中的中央化优化结构。我们的双层归约将原始问题分解为强化学习级别具有减小动作空间的级别和基于全局最优器的模仿学习级别。这两个级别都促进了学习效率和可扩展性的提高。我们注意到,机器人的不完全信息导致了学习模型的两个级别之间的差异。为了应对这个问题,Bi-CL进一步引入了平滑惩罚机制,旨在最小化两个级别之间的差异,同时不降低它们的训练效率。我们引入了一个示例来阐述问题求解方法和应用Bi-CL到两种变体:基于路线和基于图的场景。仿真结果表明,Bi-CL可以学习更有效地,与传统的多机器人协同强化学习基线具有可比较的性能。
URL
https://arxiv.org/abs/2404.14649