Paper Reading AI Learner

Differentiable Neural Architecture Search via Proximal Iterations

2019-05-30 16:24:09
Quanming Yao, Ju Xu, Wei-Wei Tu, Zhanxing Zhu

Abstract

Neural architecture search (NAS) recently attracts much research attention because of its ability to identify better architectures than handcrafted ones. However, many NAS methods, which optimize the search process in a discrete search space, need many GPU days for convergence. Recently, DARTS, which constructs a differentiable search space and then optimizes it by gradient descent, can obtain high-performance architecture and reduces the search time to several days. However, DARTS is still slow as it updates an ensemble of all operations and keeps only one after convergence. Besides, DARTS can converge to inferior architectures due to the strong correlation among operations. In this paper, we propose a new differentiable Neural Architecture Search method based on Proximal gradient descent (denoted as NASP). Different from DARTS, NASP reformulates the search process as an optimization problem with a constraint that only one operation is allowed to be updated during forward and backward propagation. Since the constraint is hard to deal with, we propose a new algorithm inspired by proximal iterations to solve it. Experiments on various tasks demonstrate that NASP can obtain high-performance architectures with 10 times of speedup on the computational time than DARTS.

Abstract (translated)

神经体系结构搜索(NAS)由于其识别更好的体系结构的能力,近年来引起了人们的广泛关注。然而,许多在离散搜索空间中优化搜索过程的NAS方法需要大量的GPU时间才能收敛。最近,DART通过构造一个可区分的搜索空间,然后通过梯度下降对其进行优化,可以获得高性能的搜索结构,并将搜索时间缩短到几天。然而,DART仍然很慢,因为它更新了所有操作的集合,并且在收敛后只保留一个。此外,由于各操作之间的强相关性,DART可以收敛到较低的体系结构。本文提出了一种新的基于近端梯度下降(NASP)的可微神经结构搜索方法。与DART不同的是,NASP将搜索过程重新表述为一个优化问题,其中有一个约束,即在向前和向后传播期间只允许更新一个操作。由于约束很难处理,我们提出了一种基于近端迭代的求解约束的新算法。对各种任务的实验表明,NASP可以获得比DART快10倍的高性能体系结构。

URL

https://arxiv.org/abs/1905.13577

PDF

https://arxiv.org/pdf/1905.13577.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot