Abstract
Inferring an adversary's goals from exhibited behavior is crucial for counterplanning and non-cooperative multi-agent systems in domains like cybersecurity, military, and strategy games. Deep Inverse Reinforcement Learning (IRL) methods based on maximum entropy principles show promise in recovering adversaries' goals but are typically offline, require large batch sizes with gradient descent, and rely on first-order updates, limiting their applicability in real-time scenarios. We propose an online Recursive Deep Inverse Reinforcement Learning (RDIRL) approach to recover the cost function governing the adversary actions and goals. Specifically, we minimize an upper bound on the standard Guided Cost Learning (GCL) objective using sequential second-order Newton updates, akin to the Extended Kalman Filter (EKF), leading to a fast (in terms of convergence) learning algorithm. We demonstrate that RDIRL is able to recover cost and reward functions of expert agents in standard and adversarial benchmark tasks. Experiments on benchmark tasks show that our proposed approach outperforms several leading IRL algorithms.
Abstract (translated)
从对手的行为推断其目标对于网络安全、军事和策略游戏中的反规划及非合作多智能体系统至关重要。基于最大熵原则的深度逆向强化学习(IRL)方法在恢复对手的目标方面展现出潜力,但这些方法通常为离线操作,需要大量批次大小以及使用梯度下降,并且依赖于一阶更新,这限制了它们在实时场景中的应用。我们提出了一种在线递归深度逆向强化学习(RDIRL)的方法来恢复控制对手行动和目标的成本函数。具体来说,我们通过类似扩展卡尔曼滤波器(EKF)的顺序二阶牛顿更新最小化标准引导成本学习(GCL)目标上的上限,从而获得一个快速收敛的学习算法。我们在标准和对抗性基准任务中展示了RDIRL能够恢复专家代理的成本和奖励函数的能力。在基准任务上的实验表明,我们提出的方法优于几种领先的IRL算法。 总结来说,这项研究提供了一种新的方法来实时地理解和预测对手的行为,在网络安全、军事策略以及多智能体系统等领域具有重要应用价值。
URL
https://arxiv.org/abs/2504.13241