TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos

2022-10-16 03:05:13

Tushar Sangam, Ishan Rajendrakumar Dave, Waqas Sultani, Mubarak Shah

arXiv_CV

arXiv_CV Detection Drone Optimization Transformer Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Drone-to-drone detection using visual feed has crucial applications like avoiding collision with other drones/airborne objects, tackling a drone attack or coordinating flight with other drones. However, the existing methods are computationally costly, follow a non-end-to-end optimization and have complex multi-stage pipeline, which make them less suitable to deploy on edge devices for real-time drone flight. In this work, we propose a simple-yet-effective framework TransVisDrone, which provides end-to-end solution with higher computational efficiency. We utilize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to learn the spatio-temporal dependencies of drone motion which improves drone detection in challenging scenarios. Our method obtains state-of-the-art performance on three challenging real-world datasets (Average Precision@0.5IOU): NPS 0.95, FLDrones 0.75 and AOT 0.80. Apart from its superior performance, it achieves higher throughput than the prior work. We also demonstrate its deployment capability on edge-computing devices and usefulness in applications like drone-collision (encounter) detection. Code: \url{this https URL}.

Abstract (translated)

URL

https://arxiv.org/abs/2210.08423

PDF

https://arxiv.org/pdf/2210.08423.pdf