Abstract
Air traffic control is a real-time safety-critical decision making process in highly dynamic and stochastic environments. In today's aviation practice, a human air traffic controller monitors and directs many aircraft flying through its designated airspace sector. With the fast growing air traffic complexity in traditional (commercial airliners) and low-altitude (drones and eVTOL aircraft) airspace, an autonomous air traffic control system is needed to accommodate high density air traffic and ensure safe separation between aircraft. We propose a deep multi-agent reinforcement learning framework that is able to identify and resolve conflicts between aircraft in a high-density, stochastic, and dynamic en-route sector with multiple intersections and merging points. The proposed framework utilizes an actor-critic model, A2C that incorporates the loss function from Proximal Policy Optimization (PPO) to help stabilize the learning process. In addition we use a centralized learning, decentralized execution scheme where one neural network is learned and shared by all agents in the environment. We show that our framework is both scalable and efficient for large number of incoming aircraft to achieve extremely high traffic throughput with safety guarantee. We evaluate our model via extensive simulations in the BlueSky environment. Results show that our framework is able to resolve 99.97% and 100% of all conflicts both at intersections and merging points, respectively, in extreme high-density air traffic scenarios.
Abstract (translated)
空中交通管制是高度动态和随机环境下的一个实时安全关键决策过程。在今天的航空实践中,一个人工空中交通管制员监控和指挥许多飞机在其指定的空域飞行。随着传统(商用飞机)和低空(无人机和EVTOL飞机)空域空中交通复杂度的快速增长,需要一个自主的空中交通控制系统来适应高密度的空中交通,并确保飞机之间的安全分离。我们提出了一个深入的多智能体强化学习框架,该框架能够识别和解决具有多个交叉点和合并点的高密度、随机和动态航路段中飞机之间的冲突。该框架采用了行为批评模型A2C,它结合了来自近端策略优化(PPO)的损失函数,以帮助稳定学习过程。此外,我们使用集中学习、分散执行方案,其中一个神经网络由环境中的所有代理学习和共享。我们表明,我们的框架既可扩展又高效,适用于大量的进港飞机,以实现极高的吞吐量和安全保证。我们通过在蓝天环境中进行广泛的模拟来评估我们的模型。结果表明,在极端高密度空中交通情况下,我们的框架能够分别解决交叉口和合流点99.97%和100%的冲突。
URL
https://arxiv.org/abs/1905.01303