BayesNAS: A Bayesian Approach for Neural Architecture Search

Abstract
Abstract (translated)
URL
PDF

Abstract

One-Shot Neural Architecture Search (NAS) is a promising method to significantly reduce search time without any separate training. It can be treated as a Network Compression problem on the architecture parameters from an over-parameterized network. However, there are two issues associated with most one-shot NAS methods. First, dependencies between a node and its predecessors and successors are often disregarded which result in improper treatment over zero operations. Second, architecture parameters pruning based on their magnitude is questionable. In this paper, we employ the classic Bayesian learning approach to alleviate these two issues by modeling architecture parameters using hierarchical automatic relevance determination (HARD) priors. Unlike other NAS methods, we train the over-parameterized network for only one epoch then update the architecture. Impressively, this enabled us to find the architecture in both proxy and proxyless tasks on CIFAR-10 within only 0.2 GPU days using a single GPU. As a byproduct, our approach can be transferred directly to compress convolutional neural networks by enforcing structural sparsity which achieves extremely sparse networks without accuracy deterioration.

Abstract (translated)

一次神经结构搜索（NAS）是一种不需要单独训练就能显著缩短搜索时间的方法。它可以被视为一个网络压缩问题的架构参数从一个过度参数化的网络。然而，有两个问题与大多数单镜头NAS方法有关。首先，节点与其前辈和后继节点之间的依赖性常常被忽略，这导致对零操作的处理不当。第二，基于规模的架构参数修剪是有问题的。本文采用经典的贝叶斯学习方法，利用层次自动关联确定（hard）先验对体系结构参数进行建模，以缓解这两个问题。与其他NAS方法不同，我们只为一个epoch训练过参数化网络，然后更新体系结构。令人印象深刻的是，这使我们能够在使用单个GPU的0.2 gpu天内在cifar-10上的代理和无代理任务中找到体系结构。作为副产品，我们的方法可以通过增强结构稀疏性来直接压缩卷积神经网络，从而在不降低精度的情况下实现极稀疏网络。

URL

https://arxiv.org/abs/1905.04919

PDF

https://arxiv.org/pdf/1905.04919.pdf