Paper Reading AI Learner

Super accurate low latency object detection on a surveillance UAV

2019-04-03 14:29:07
Maarten Vandersteegen, Kristof Vanbeeck, Toon goedeme

Abstract

Drones have proven to be useful in many industry segments such as security and surveillance, where e.g. on-board real-time object tracking is a necessity for autonomous flying guards. Tracking and following suspicious objects is therefore required in real-time on limited hardware. With an object detector in the loop, low latency becomes extremely important. In this paper, we propose a solution to make object detection for UAVs both fast and super accurate. We propose a multi-dataset learning strategy yielding top eye-sky object detection accuracy. Our model generalizes well on unseen data and can cope with different flying heights, optically zoomed-in shots and different viewing angles. We apply optimization steps such that we achieve minimal latency on embedded on-board hardware by fusing layers, quantizing calculations to 16-bit floats and 8-bit integers, with negligible loss in accuracy. We validate on NVIDIA's Jetson TX2 and Jetson Xavier platforms where we achieve a speed-wise performance boost of more than 10x.

Abstract (translated)

无人驾驶飞机已被证明在安全和监视等许多行业领域都很有用,在这些领域,如机载实时目标跟踪是自主飞行警卫的必要条件。因此,需要在有限的硬件上实时跟踪和跟踪可疑对象。在循环中使用对象检测器,低延迟变得非常重要。本文提出了一种无人机目标检测快速、超精确的解决方案。我们提出了一种多数据集的学习策略,以获得最佳的天眼目标检测精度。我们的模型对看不见的数据进行了很好的概括,能够处理不同的飞行高度、光学变焦镜头和不同的视角。我们采用优化步骤,通过融合层、将计算量化为16位浮点数和8位整数,在嵌入式车载硬件上实现最小延迟,精确度损失可以忽略不计。我们在Nvidia的Jetson TX2和Jetson Xavier平台上进行了验证,在这些平台上,我们实现了超过10倍的速度性能提升。

URL

https://arxiv.org/abs/1904.02024

PDF

https://arxiv.org/pdf/1904.02024.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot