Abstract
Architectures obtained by Neural Architecture Search (NAS) have achieved highly competitive performance in various computer vision tasks. However, the prohibitive computation demand of forward-backward propagation in deep neural networks and searching algorithms makes it difficult to apply NAS in practice. In this paper, we propose a Multinomial Distribution Learning for extremely effective NAS, which considers the search space as a joint multinomial distribution, i.e., the operation between two nodes is sampled from this distribution, and the optimal network structure is obtained by the operations with the most likely probability in this distribution. Therefore, NAS can be transformed to a multinomial distribution learning problem, i.e., the distribution is optimized to have high expectation of the performance. Besides, a hypothesis that the performance ranking is consistent in every training epoch is proposed and demonstrated to further accelerate the learning process. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of our method. On CIFAR-10, the structure searched by our method achieves 2.4\% test error, while being 6.0 $\times$ (only 4 GPU hours on GTX1080Ti) faster compared with state-of-the-art NAS algorithms. On ImageNet, our model achieves 75.2\% top-1 accuracy under MobileNet settings (MobileNet V1/V2), while being 1.2$\times$ faster with measured GPU latency. Test code is available at this https URL
Abstract (translated)
神经结构搜索(NAS)所获得的体系结构在各种计算机视觉任务中都具有很强的竞争力。然而,由于深部神经网络和搜索算法对前向反向传播的计算要求不高,使得神经网络在实际应用中变得困难。本文提出了一种非常有效的NAS的多项式分布学习方法,将搜索空间看作是一个联合的多项式分布,即从该分布中抽取两个节点之间的操作,通过该分布中概率最大的操作得到最优的网络结构。在。因此,可以将NAS转化为一个多项式分布学习问题,即对分布进行优化,使其具有较高的性能期望。此外,还提出并论证了在每个训练阶段绩效排名一致的假设,以进一步加速学习过程。在CIFAR-10和IMAGENET上的实验证明了该方法的有效性。在cifar-10上,我们的方法搜索的结构达到2.4%的测试误差,而与最先进的NAS算法相比,在gtx1080ti上只有4个gpu小时(6.0$倍)。在ImageNet上,我们的模型在Mobilenet设置(Mobilenet v1/v2)下达到75.2%的最高1精度,而在测量GPU延迟的情况下,速度快了1.2美元。测试代码在此https URL上可用
URL
https://arxiv.org/abs/1905.07529