Paper Reading AI Learner

C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks

2024-04-30 05:51:21
Sairam VC Rebbapragada, Pranoy Panda, Vineeth N Balasubramanian

Abstract

A vision-based drone-to-drone detection system is crucial for various applications like collision avoidance, countering hostile drones, and search-and-rescue operations. However, detecting drones presents unique challenges, including small object sizes, distortion, occlusion, and real-time processing requirements. Current methods integrating multi-scale feature fusion and temporal information have limitations in handling extreme blur and minuscule objects. To address this, we propose a novel coarse-to-fine detection strategy based on vision transformers. We evaluate our approach on three challenging drone-to-drone detection datasets, achieving F1 score enhancements of 7%, 3%, and 1% on the FL-Drones, AOT, and NPS-Drones datasets, respectively. Additionally, we demonstrate real-time processing capabilities by deploying our model on an edge-computing device. Our code will be made publicly available.

Abstract (translated)

基于视觉的无人机对无人机检测系统对于各种应用,如避障、应对敌对无人机和搜索与救援任务至关重要。然而,检测无人机存在独特的挑战,包括小物体尺寸、畸变、遮挡和实时处理需求。目前将多尺度特征融合和时间信息相结合的方法在处理极端模糊和微小物体方面存在局限。为了应对这一挑战,我们提出了一个基于视觉变压器的全新粗-到细检测策略。我们在FL-Drones、AOT和NPS-Drones等三个具有挑战性的无人机对无人机检测数据集上进行了评估,分别实现了FL-Drones数据集的F1得分提高7%、AOT数据集的F1得分提高3%和NPS-Drones数据集的F1得分提高1%。此外,通过将我们的模型部署在边缘计算设备上,我们还展示了实时处理能力。我们的代码将公开发布。

URL

https://arxiv.org/abs/2404.19276

PDF

https://arxiv.org/pdf/2404.19276.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot