Paper Reading AI Learner

Multinomial Distribution Learning for Effective Neural Architecture Search

2019-05-23 07:18:24
Xiawu Zheng, Rongrong Ji, Lang Tang, Baochang Zhang, Jianzhuang Liu, Qi Tian
     

Abstract

Architectures obtained by Neural Architecture Search (NAS) have achieved highly competitive performance in various computer vision tasks. However, the prohibitive computation demand of forward-backward propagation in deep neural networks and searching algorithms makes it difficult to apply NAS in practice. In this paper, we propose a Multinomial Distribution Learning for extremely effective NAS, which considers the search space as a joint multinomial distribution, i.e., the operation between two nodes is sampled from this distribution, and the optimal network structure is obtained by the operations with the most likely probability in this distribution. Therefore, NAS can be transformed to a multinomial distribution learning problem, i.e., the distribution is optimized to have high expectation of the performance. Besides, a hypothesis that the performance ranking is consistent in every training epoch is proposed and demonstrated to further accelerate the learning process. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of our method. On CIFAR-10, the structure searched by our method achieves 2.4\% test error, while being 6.0 $\times$ (only 4 GPU hours on GTX1080Ti) faster compared with state-of-the-art NAS algorithms. On ImageNet, our model achieves 75.2\% top-1 accuracy under MobileNet settings (MobileNet V1/V2), while being 1.2$\times$ faster with measured GPU latency. Test code is available at this https URL

Abstract (translated)

神经结构搜索(NAS)所获得的体系结构在各种计算机视觉任务中都具有很强的竞争力。然而,由于深部神经网络和搜索算法对前向反向传播的计算要求不高,使得神经网络在实际应用中变得困难。本文提出了一种非常有效的NAS的多项式分布学习方法,将搜索空间看作是一个联合的多项式分布,即从该分布中抽取两个节点之间的操作,通过该分布中概率最大的操作得到最优的网络结构。在。因此,可以将NAS转化为一个多项式分布学习问题,即对分布进行优化,使其具有较高的性能期望。此外,还提出并论证了在每个训练阶段绩效排名一致的假设,以进一步加速学习过程。在CIFAR-10和IMAGENET上的实验证明了该方法的有效性。在cifar-10上,我们的方法搜索的结构达到2.4%的测试误差,而与最先进的NAS算法相比,在gtx1080ti上只有4个gpu小时(6.0$倍)。在ImageNet上,我们的模型在Mobilenet设置(Mobilenet v1/v2)下达到75.2%的最高1精度,而在测量GPU延迟的情况下,速度快了1.2美元。测试代码在此https URL上可用

URL

https://arxiv.org/abs/1905.07529

PDF

https://arxiv.org/pdf/1905.07529.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot