Abstract
When agents that are independently trained (or designed) to complete their individual tasks are deployed in a shared environment, their joint actions may produce negative side effects (NSEs). As their training does not account for the behavior of other agents or their joint action effects on the environment, the agents have no prior knowledge of the NSEs of their actions. We model the problem of mitigating NSEs in a cooperative multi-agent system as a Lexicographic Decentralized Markov Decision Process with two objectives. The agents must optimize the completion of their assigned tasks while mitigating NSEs. We assume independence of transitions and rewards with respect to the agents' tasks but the joint NSE penalty creates a form of dependence in this setting. To improve scalability, the joint NSE penalty is decomposed into individual penalties for each agent using credit assignment, which facilitates decentralized policy computation. Our results in simulation on three domains demonstrate the effectiveness and scalability of our approach in mitigating NSEs by updating the policies of a subset of agents in the system.
Abstract (translated)
当在共享环境中部署了那些经过独立训练(或设计)以完成各自任务的代理程序时,它们的联合行动可能会产生负面副作用(NSEs)。由于它们的训练没有考虑到其他代理程序的行为或它们联合行动对环境的影响,代理程序没有关于其行动NSEs的先验知识。我们将缓解NSEs的问题建模为合作多代理系统中的Lexicographic Decentralized Markov Decision Process,具有两个目标。代理程序必须在缓解NSEs的同时完成其分配的任务。我们假设与代理程序任务相关的转移和奖励是相互独立的,但联合NSE惩罚在某种程度上导致了这种设置中的一种形式上的依赖关系。为了提高可扩展性,联合NSE惩罚通过信用分配分解为每个代理程序的单独惩罚,这有助于促进分布式策略计算。我们在三个领域的模拟结果表明,通过更新系统中的部分代理程序策略,缓解NSEs的有效性和可扩展性。
URL
https://arxiv.org/abs/2405.04702