Abstract
This work focuses on the persistent monitoring problem, where a set of targets moving based on an unknown model must be monitored by an autonomous mobile robot with a limited sensing range. To keep each target's position estimate as accurate as possible, the robot needs to adaptively plan its path to (re-)visit all the targets and update its belief from measurements collected along the way. In doing so, the main challenge is to strike a balance between exploitation, i.e., re-visiting previously-located targets, and exploration, i.e., finding new targets or re-acquiring lost ones. Encouraged by recent advances in deep reinforcement learning, we introduce an attention-based neural solution to the persistent monitoring problem, where the agent can learn the inter-dependencies between targets, i.e., their spatial and temporal correlations, conditioned on past measurements. This endows the agent with the ability to determine which target, time, and location to attend to across multiple scales, which we show also helps relax the usual limitations of a finite target set. We experimentally demonstrate that our method outperforms other baselines in terms of number of targets visits and average estimation error in complex environments. Finally, we implement and validate our model in a drone-based simulation experiment to monitor mobile ground targets in a high-fidelity simulator.
Abstract (translated)
这项工作重点是持久的监测问题,该问题涉及一组基于未知模型移动的目标,必须由一只具有有限感知范围的自主移动机器人进行监测。为了尽可能准确地保持每个目标的位置估计,机器人需要自适应地规划其路径,(再次)访问所有目标并更新其信念从沿途收集的测量数据。在这个过程中,主要挑战是在利用和探索之间的平衡之间取得平衡,即重新访问以前位置的目标,或寻找新的目标或重新获取丢失的目标。受到最近深度学习进展的鼓舞,我们介绍了一种基于注意力的神经网络解决方案来解决持久的监测问题,该方案使Agent能够学习目标之间的相互依赖性,即它们的空间和时间 correlation conditioning on 过去测量数据。这赋予Agent能力,确定在不同尺度上 attend 到哪些目标、时间和位置,我们表明这也有助于放松有限目标集合的常见限制。我们实验表明,我们的方法在复杂环境中的访问目标数量和平均估计误差方面优于其他基准方法。最后,我们使用无人机模拟实验实现了并验证了我们的模型,以监测在高保真模拟中移动的地面目标。
URL
https://arxiv.org/abs/2303.06350