Paper Reading AI Learner

Fast and Accurate, Convolutional Neural Network Based Approach for Object Detection from UAV

2018-08-16 13:22:00
Xiaoliang Wang, Peng Cheng, Xinchuan Liu, Benedict Uzochukwu

Abstract

The ever-growing interest witnessed in the acquisition and development of unmanned aerial vehicles (UAVs), commonly known as drones in the past few years, has brought generation of a very promising and effective technology. Because of their characteristic of small size and fast deployment, UAVs have shown their effectiveness in collecting data over unreachable areas and restricted coverage zones. Moreover, their flexible-defined capacity enables them to collect information with a very high level of detail, leading to high resolution images. UAVs mainly served in military scenario. However, in the last decade, they have being broadly adopted in civilian applications as well. The task of aerial surveillance and situation awareness is usually completed by integrating intelligence, surveillance, observation, and navigation systems, all interacting in the same operational framework. To build this capability, UAV's are well suited tools that can be equipped with a wide variety of sensors, such as cameras or radars. Deep learning has been widely recognized as a prominent approach in different computer vision applications. Specifically, one-stage object detector and two-stage object detector are regarded as the most important two groups of Convolutional Neural Network based object detection methods. One-stage object detector could usually outperform two-stage object detector in speed; however, it normally trails in detection accuracy, compared with two-stage object detectors. In this study, focal loss based RetinaNet, which works as one-stage object detector, is utilized to be able to well match the speed of regular one-stage detectors and also defeat two-stage detectors in accuracy, for UAV based object detection. State-of-the-art performance result has been showed on the UAV captured image dataset-Stanford Drone Dataset (SDD).

Abstract (translated)

在过去几年中通常被称为无人机的无人驾驶飞行器(UAV)的获得和开发中所见证的不断增长的兴趣已经带来了一代非常有前途和有效的技术。由于无人机具有体积小,部署快速的特点,因此在无法到达的区域和受限覆盖区域收集数据方面表现出了有效性。此外,它们灵活定义的容量使它们能够以非常高的细节收集信息,从而产生高分辨率的图像。无人机主要用于军事场景。然而,在过去十年中,它们也被广泛用于民用领域。空中监视和情况意识的任务通常是通过整合情报,监视,观察和导航系统来完成的,所有这些系统都在同一业务框架内相互作用。为了建立这种能力,无人机是非常适合的工具,可以配备各种传感器,如相机或雷达。深度学习已被广​​泛认为是不同计算机视觉应用中的突出方法。具体而言,一级物体探测器和两级物体探测器被认为是最重要的两组基于卷积神经网络的物体探测方法。一级物体探测器通常可以在速度上优于两级物体探测器;然而,与两级物体探测器相比,它通常具有检测精度。在这项研究中,基于焦点损失的RetinaNet作为一级物体探测器,用于能够很好地匹配常规一级探测器的速度,并且还能够准确地击败两级探测器,用于基于无人飞行器的物体探测。最新的性能结果已经显示在无人机捕获的图像数据集 - 斯坦福无人机数据集(SDD)上。

URL

https://arxiv.org/abs/1808.05756

PDF

https://arxiv.org/pdf/1808.05756.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot