Paper Reading AI Learner

Clustered Object Detection in Aerial Images

2019-04-16 23:01:53
Fan Yang, Heng Fan, Peng Chu, Erik Blasch, Haibin Ling

Abstract

Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in terms of pixels, making them hard to be distinguished from surrounding background; and (2) targets are in general very sparsely and nonuniformly distributed, making the detection very inefficient. In this paper we address both issues inspired by the observation that these targets are often clustered. In particular, we propose a Clustered Detection (ClusDet) network that unifies object cluster and detection in an end-to-end framework. The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet). Given an input image, CPNet produces (object) cluster regions and ScaleNet estimates object scales for these regions. Then, each scale-normalized cluster region and their features are fed into DetecNet for object detection. Compared with previous solutions, ClusDet has several advantages: (1) it greatly reduces the number of blocks for final object detection and hence achieves high running time efficiency, (2) the cluster-based scale estimation is more accurate than previously used single-object based ones, hence effectively improves the detection for small objects, and (3) the final DetecNet is dedicated for clustered regions and implicitly models the prior context information so as to boost detection accuracy. The proposed method is tested on three representative aerial image datasets including VisDrone, UAVDT and DOTA. In all the experiments, ClusDet achieves promising performance in both efficiency and accuracy, in comparison with state-of-the-art detectors.

Abstract (translated)

在航空影像中探测物体具有挑战性,至少有两个原因:(1)目标物体,如行人,像素非常小,难以与周围背景区分;(2)目标总体上非常稀疏和不均匀分布,使得探测效率非常低。在本文中,我们讨论了这两个问题,这两个问题都是由观察到的,即这些目标通常是聚集的。特别地,我们提出了一个集群检测(clusdet)网络,它在端到端框架中统一了对象集群和检测。clusdet的关键组成部分包括集群方案子网(cpnet)、尺度估计子网(scalenet)和专用检测网(detecnet)。给定一个输入图像,cpnet生成(对象)集群区域,scalenet估计这些区域的对象比例。然后将各尺度归一化聚类区域及其特征输入检测网进行目标检测。与以前的解决方案相比,clusdet具有以下优点:(1)大大减少了最终目标检测的块数,从而达到了较高的运行时间效率;(2)基于簇的尺度估计比以前使用的基于单目标的尺度估计更准确,从而有效地提高了对小目标的检测。(3)最后一个检测集专门用于聚类区域,并隐式地对先前的上下文信息进行建模,以提高检测精度。该方法在三个具有代表性的航空图像数据集上进行了测试,包括无人机、无人机和多塔。在所有的实验中,与最先进的探测器相比,clusdet在效率和准确性方面都取得了很好的性能。

URL

https://arxiv.org/abs/1904.08008

PDF

https://arxiv.org/pdf/1904.08008.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot