Abstract
Neural architecture search (NAS) recently attracts much research attention because of its ability to identify better architectures than handcrafted ones. However, many NAS methods, which optimize the search process in a discrete search space, need many GPU days for convergence. Recently, DARTS, which constructs a differentiable search space and then optimizes it by gradient descent, can obtain high-performance architecture and reduces the search time to several days. However, DARTS is still slow as it updates an ensemble of all operations and keeps only one after convergence. Besides, DARTS can converge to inferior architectures due to the strong correlation among operations. In this paper, we propose a new differentiable Neural Architecture Search method based on Proximal gradient descent (denoted as NASP). Different from DARTS, NASP reformulates the search process as an optimization problem with a constraint that only one operation is allowed to be updated during forward and backward propagation. Since the constraint is hard to deal with, we propose a new algorithm inspired by proximal iterations to solve it. Experiments on various tasks demonstrate that NASP can obtain high-performance architectures with 10 times of speedup on the computational time than DARTS.
Abstract (translated)
神经体系结构搜索(NAS)由于其识别更好的体系结构的能力,近年来引起了人们的广泛关注。然而,许多在离散搜索空间中优化搜索过程的NAS方法需要大量的GPU时间才能收敛。最近,DART通过构造一个可区分的搜索空间,然后通过梯度下降对其进行优化,可以获得高性能的搜索结构,并将搜索时间缩短到几天。然而,DART仍然很慢,因为它更新了所有操作的集合,并且在收敛后只保留一个。此外,由于各操作之间的强相关性,DART可以收敛到较低的体系结构。本文提出了一种新的基于近端梯度下降(NASP)的可微神经结构搜索方法。与DART不同的是,NASP将搜索过程重新表述为一个优化问题,其中有一个约束,即在向前和向后传播期间只允许更新一个操作。由于约束很难处理,我们提出了一种基于近端迭代的求解约束的新算法。对各种任务的实验表明,NASP可以获得比DART快10倍的高性能体系结构。
URL
https://arxiv.org/abs/1905.13577