Paper Reading AI Learner

BayesNAS: A Bayesian Approach for Neural Architecture Search

2019-05-13 09:00:13
Hongpeng Zhou, Minghao Yang, Jun Wang, Wei Pan

Abstract

One-Shot Neural Architecture Search (NAS) is a promising method to significantly reduce search time without any separate training. It can be treated as a Network Compression problem on the architecture parameters from an over-parameterized network. However, there are two issues associated with most one-shot NAS methods. First, dependencies between a node and its predecessors and successors are often disregarded which result in improper treatment over zero operations. Second, architecture parameters pruning based on their magnitude is questionable. In this paper, we employ the classic Bayesian learning approach to alleviate these two issues by modeling architecture parameters using hierarchical automatic relevance determination (HARD) priors. Unlike other NAS methods, we train the over-parameterized network for only one epoch then update the architecture. Impressively, this enabled us to find the architecture in both proxy and proxyless tasks on CIFAR-10 within only 0.2 GPU days using a single GPU. As a byproduct, our approach can be transferred directly to compress convolutional neural networks by enforcing structural sparsity which achieves extremely sparse networks without accuracy deterioration.

Abstract (translated)

一次神经结构搜索(NAS)是一种不需要单独训练就能显著缩短搜索时间的方法。它可以被视为一个网络压缩问题的架构参数从一个过度参数化的网络。然而,有两个问题与大多数单镜头NAS方法有关。首先,节点与其前辈和后继节点之间的依赖性常常被忽略,这导致对零操作的处理不当。第二,基于规模的架构参数修剪是有问题的。本文采用经典的贝叶斯学习方法,利用层次自动关联确定(hard)先验对体系结构参数进行建模,以缓解这两个问题。与其他NAS方法不同,我们只为一个epoch训练过参数化网络,然后更新体系结构。令人印象深刻的是,这使我们能够在使用单个GPU的0.2 gpu天内在cifar-10上的代理和无代理任务中找到体系结构。作为副产品,我们的方法可以通过增强结构稀疏性来直接压缩卷积神经网络,从而在不降低精度的情况下实现极稀疏网络。

URL

https://arxiv.org/abs/1905.04919

PDF

https://arxiv.org/pdf/1905.04919.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot