Abstract
This article presents a deep reinforcement learning-based approach to tackle a persistent surveillance mission requiring a single unmanned aerial vehicle initially stationed at a depot with fuel or time-of-flight constraints to repeatedly visit a set of targets with equal priority. Owing to the vehicle's fuel or time-of-flight constraints, the vehicle must be regularly refueled, or its battery must be recharged at the depot. The objective of the problem is to determine an optimal sequence of visits to the targets that minimizes the maximum time elapsed between successive visits to any target while ensuring that the vehicle never runs out of fuel or charge. We present a deep reinforcement learning algorithm to solve this problem and present the results of numerical experiments that corroborate the effectiveness of this approach in comparison with common-sense greedy heuristics.
Abstract (translated)
这篇文章介绍了一种基于深度强化学习的解决方案来解决需要一个最初在仓库内停留的无人直升机进行持久监视任务的问题,该任务要求无人机多次以平等优先级访问一组目标。由于无人机的燃料或飞行时间约束,车辆必须定期充电或放电。问题的目标是确定一个最优的访问目标序列,使得每次连续访问之间的时间间隔最小,同时确保车辆不会因为缺乏燃料或电量而停止。我们提出了一个基于深度强化学习的算法来解决这个问题,并通过数值实验验证了这种方法与常见策略梯度的效果。
URL
https://arxiv.org/abs/2404.06423