Abstract
In this paper, we investigate the problem of enabling a drone to fly through a tilted narrow gap, without a traditional planning and control pipeline. To this end, we propose an end-to-end policy network, which imitates from the traditional pipeline and is fine-tuned using reinforcement learning. Unlike previous works which plan dynamical feasible trajectories using motion primitives and track the generated trajectory by a geometric controller, our proposed method is an end-to-end approach which takes the flight scenario as input and directly outputs thrust-attitude control commands for the quadrotor. Key contributions of our paper are: 1) presenting an imitate-reinforce training framework. 2) flying through a narrow gap using an end-to-end policy network, showing that learning based method can also address the highly dynamic control problem as the traditional pipeline does (see attached video: https://www.youtube.com/watch?v=jU1qRcLdjx0). 3) propose a robust imitation of an optimal trajectory generator using multilayer perceptrons. 4) show how reinforcement learning can improve the performance of imitation learning, and the potential to achieve higher performance over the model-based method.
Abstract (translated)
在本文中,我们研究了无人机在没有传统规划和控制管线的情况下,通过倾斜窄缝飞行的问题。为此,我们提出了一个端到端的策略网络,它模仿传统的管道,并通过强化学习进行微调。与以往的工作不同的是,我们提出的方法是以飞行场景为输入,直接输出四旋翼推力姿态控制指令的端到端方法,该方法利用运动原语规划动态可行轨迹,并通过几何控制器跟踪生成的轨迹。本文的主要贡献是:1)提出了模拟强化训练框架。2)使用端到端的策略网络跨越狭窄的鸿沟,这表明基于学习的方法也可以像传统的管道那样解决高度动态的控制问题(见附件视频:https://www.youtube.com/watch?v=ju1krcdjx0)。3)利用多层感知器对最优轨迹发生器进行鲁棒仿真。4)展示强化学习如何提高模仿学习的性能,以及与基于模型的方法相比实现更高性能的潜力。
URL
https://arxiv.org/abs/1903.09088