Paper Reading AI Learner

NAS-FCOS: Fast Neural Architecture Search for Object Detection

2019-06-11 07:55:41
Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen

Abstract

The success of deep neural networks relies on significant architecture engineering. Recently neural architecture search (NAS) has emerged as a promise to greatly reduce manual effort in network design by automatically searching for optimal architectures, although typically such algorithms need an excessive amount of computational resources, e.g., a few thousand GPU-days. To date, on challenging vision tasks such as object detection, NAS, especially fast versions of NAS, is less studied. Here we propose to search for the decoder structure of object detectors with search efficiency being taken into consideration. To be more specific, we aim to efficiently search for the feature pyramid network (FPN) as well as the prediction head of a simple anchor-free object detector, namely FCOS [20], using a tailored reinforcement learning paradigm. With carefully designed search space, search algorithms and strategies for evaluating network quality, we are able to efficiently search more than 2, 000 architectures in around 30 GPU-days. The discovered architecture surpasses state-of-the-art object detection models (such as Faster R-CNN, RetinaNet and FCOS) by 1 to 1.9 points in AP on the COCO dataset, with comparable computation complexity and memory footprint, demonstrating the efficacy of the proposed NAS for object detection.

Abstract (translated)

深度神经网络的成功依赖于重要的建筑工程。最近,神经架构搜索(NAS)已成为一种承诺,通过自动搜索最佳架构,大大减少网络设计中的人工工作,尽管通常这种算法需要大量的计算资源,例如几千gpu天。到目前为止,对于具有挑战性的视觉任务,如目标检测,NAS,尤其是快速版本的NAS,研究较少。在这里,我们建议搜索目标检测器的解码器结构,并考虑搜索效率。更具体地说,我们的目标是使用定制的强化学习范式,高效地搜索特征金字塔网络(FPN)以及简单无锚目标检测器的预测头,即fcos[20]。凭借精心设计的搜索空间、搜索算法和评估网络质量的策略,我们能够在大约30 gpu天内有效搜索2000多个架构。在Coco数据集上,发现的架构比最先进的对象检测模型(如更快的R-CNN、Retinanet和FCOS)在AP中超过了1到1.9个点,具有可比的计算复杂性和内存足迹,证明了所提出的NAS用于对象检测的有效性。

URL

https://arxiv.org/abs/1906.04423

PDF

https://arxiv.org/pdf/1906.04423.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot