Neural architecture search automates the design of neural network architectures usually by exploring a large and thus complex architecture search space. To advance the architecture search, we present a graph diffusion-based NAS approach that uses discrete conditional graph diffusion processes to generate high-performing neural network architectures. We then propose a multi-conditioned classifier-free guidance approach applied to graph diffusion networks to jointly impose constraints such as high accuracy and low hardware latency. Unlike the related work, our method is completely differentiable and requires only a single model training. In our evaluations, we show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed, i.e. less than 0.2 seconds per architecture. Furthermore, we demonstrate the generalisability and efficiency of our method through experiments on ImageNet dataset.
神经架构搜索通过探索大型且因此复杂的架构搜索空间,自动设计神经网络架构。为了提高架构搜索,我们提出了一个基于离散条件图扩散过程的NAS方法,该方法使用离散条件图扩散过程生成高性能的神经网络架构。然后,我们提出了一种应用于图扩散网络的多条件分类器-无关指导方法,以共同约束高准确性和低硬件延迟。与相关研究不同,我们的方法完全不同可导,只需要一个模型训练。在我们的评估中,我们在六个标准基准上展示了良好的结果,从而在不到0.2秒的时间内产生了新的和独特的架构。此外,我们通过在ImageNet数据集上的实验证明了我们的方法的泛化能力和效率。
https://arxiv.org/abs/2403.06020
We present ECToNAS, a cost-efficient evolutionary cross-topology neural architecture search algorithm that does not require any pre-trained meta controllers. Our framework is able to select suitable network architectures for different tasks and hyperparameter settings, independently performing cross-topology optimisation where required. It is a hybrid approach that fuses training and topology optimisation together into one lightweight, resource-friendly process. We demonstrate the validity and power of this approach with six standard data sets (CIFAR-10, CIFAR-100, EuroSAT, Fashion MNIST, MNIST, SVHN), showcasing the algorithm's ability to not only optimise the topology within an architectural type, but also to dynamically add and remove convolutional cells when and where required, thus crossing boundaries between different network types. This enables researchers without a background in machine learning to make use of appropriate model types and topologies and to apply machine learning methods in their domains, with a computationally cheap, easy-to-use cross-topology neural architecture search framework that fully encapsulates the topology optimisation within the training process.
我们提出了ECToNAS,一种成本效益高的进化交叉 topology 神经架构搜索算法,它不需要任何预训练的元控制器。我们的框架能够独立执行跨 topology 优化,在需要时提供。这是一种结合训练和 topology 优化的轻量级、资源友好的过程。我们用六个标准数据集(CIFAR-10,CIFAR-100,EuroSAT,Fashion MNIST,MNIST,SVHN)证明了这种方法的有效性和威力,展示了算法不仅可以在架构类型的 topology 上进行优化,还可以在需要时动态添加和删除卷积细胞,从而跨越不同网络类型的边界。这使得没有机器学习背景的研究人员可以使用适当的模型类型和 topology,并在其领域应用机器学习方法。我们提供了一个计算上便宜、易于使用的跨 topology 神经架构搜索框架,完全封装了 topology 优化在训练过程中。
https://arxiv.org/abs/2403.05123
The existing graph neural architecture search (GNAS) methods heavily rely on supervised labels during the search process, failing to handle ubiquitous scenarios where supervisions are not available. In this paper, we study the problem of unsupervised graph neural architecture search, which remains unexplored in the literature. The key problem is to discover the latent graph factors that drive the formation of graph data as well as the underlying relations between the factors and the optimal neural architectures. Handling this problem is challenging given that the latent graph factors together with architectures are highly entangled due to the nature of the graph and the complexity of the neural architecture search process. To address the challenge, we propose a novel Disentangled Self-supervised Graph Neural Architecture Search (DSGAS) model, which is able to discover the optimal architectures capturing various latent graph factors in a self-supervised fashion based on unlabeled graph data. Specifically, we first design a disentangled graph super-network capable of incorporating multiple architectures with factor-wise disentanglement, which are optimized simultaneously. Then, we estimate the performance of architectures under different factors by our proposed self-supervised training with joint architecture-graph disentanglement. Finally, we propose a contrastive search with architecture augmentations to discover architectures with factor-specific expertise. Extensive experiments on 11 real-world datasets demonstrate that the proposed model is able to achieve state-of-the-art performance against several baseline methods in an unsupervised manner.
现有的图神经网络架构搜索(GNAS)方法在搜索过程中严重依赖有监督标签,在无法获得监督时无法处理普遍场景。在本文中,我们研究了无监督图神经网络架构搜索问题,该问题在文献中尚未被深入研究。关键问题是要发现推动图数据形成的关键隐式图因素以及因素之间的底层关系。由于图的性质和神经网络架构搜索过程的复杂性,解决这个问题具有挑战性。为了解决这个问题,我们提出了一个新颖的剥离式自监督图神经网络架构搜索(DSGAS)模型,它能够以自监督的方式基于无监督图数据发现最优架构,捕捉各种隐式图因素。具体来说,我们首先设计了一个剥离的图超网络,具有多个架构因子水平的剥离,这些因子同时优化。然后,我们通过自监督训练与架构-图剥离估计架构的性能。最后,我们提出了一个对比性搜索与架构增强来发现具有特定专业知识因素的架构。在11个真实世界数据集上进行广泛的实验证明,与几个基线方法相比,该模型可以在无需监督的情况下实现最先进的性能。
https://arxiv.org/abs/2403.05064
Training-free metrics (a.k.a. zero-cost proxies) are widely used to avoid resource-intensive neural network training, especially in Neural Architecture Search (NAS). Recent studies show that existing training-free metrics have several limitations, such as limited correlation and poor generalisation across different search spaces and tasks. Hence, we propose Sample-Wise Activation Patterns and its derivative, SWAP-Score, a novel high-performance training-free metric. It measures the expressivity of networks over a batch of input samples. The SWAP-Score is strongly correlated with ground-truth performance across various search spaces and tasks, outperforming 15 existing training-free metrics on NAS-Bench-101/201/301 and TransNAS-Bench-101. The SWAP-Score can be further enhanced by regularisation, which leads to even higher correlations in cell-based search space and enables model size control during the search. For example, Spearman's rank correlation coefficient between regularised SWAP-Score and CIFAR-100 validation accuracies on NAS-Bench-201 networks is 0.90, significantly higher than 0.80 from the second-best metric, NWOT. When integrated with an evolutionary algorithm for NAS, our SWAP-NAS achieves competitive performance on CIFAR-10 and ImageNet in approximately 6 minutes and 9 minutes of GPU time respectively.
无训练指标(也称为零成本代理)广泛用于避免资源密集型神经网络训练,特别是在神经架构搜索(NAS)中。最近的研究表明,现有的训练指标具有多个局限性,如在不同搜索空间和任务上的相关性有限和泛化性能差。因此,我们提出了样本加权激活模式及其导数(SWAP-Score),一种新颖的高性能训练指标。它衡量网络在批输入样本上的表现力。SWAP-Score在各种搜索空间和任务上的地面真值性能上高度相关,在NAS-Bench-101/201/301和TransNAS-Bench-101上优于15个现有训练指标。通过正则化可以进一步增强SWAP-Score,从而在基于细胞的搜索空间中实现更高的相关性,并在搜索过程中实现模型大小的控制。例如,在NAS-Bench-201网络上的正常斯皮尔曼秩相关系数 between 经过正则化的SWAP-Score和CIFAR-100验证准确率之间的比值约为0.90,比第二好的指标NWOT高出约0.80。当与进化算法集成时,我们的SWAP-NAS在分别大约6分钟和9分钟的GPU时间内在CIFAR-10和ImageNet上实现竞争力的性能。
https://arxiv.org/abs/2403.04161
Neural network models have a number of hyperparameters that must be chosen along with their architecture. This can be a heavy burden on a novice user, choosing which architecture and what values to assign to parameters. In most cases, default hyperparameters and architectures are used. Significant improvements to model accuracy can be achieved through the evaluation of multiple architectures. A process known as Neural Architecture Search (NAS) may be applied to automatically evaluate a large number of such architectures. A system integrating open source tools for Neural Architecture Search (OpenNAS), in the classification of images, has been developed as part of this research. OpenNAS takes any dataset of grayscale, or RBG images, and generates Convolutional Neural Network (CNN) architectures based on a range of metaheuristics using either an AutoKeras, a transfer learning or a Swarm Intelligence (SI) approach. Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) are used as the SI algorithms. Furthermore, models developed through such metaheuristics may be combined using stacking ensembles. In the context of this paper, we focus on training and optimizing CNNs using the Swarm Intelligence (SI) components of OpenNAS. Two major types of SI algorithms, namely PSO and ACO, are compared to see which is more effective in generating higher model accuracies. It is shown, with our experimental design, that the PSO algorithm performs better than ACO. The performance improvement of PSO is most notable with a more complex dataset. As a baseline, the performance of fine-tuned pre-trained models is also evaluated.
神经网络模型有许多超参数需要选择,并与其架构一起选择。这可能对新手用户来说是一个沉重的负担,因为他们必须选择架构和为参数分配值。在大多数情况下,使用默认的超参数和架构即可。通过评估多种架构,可以实现模型的显著提高准确性。一种名为神经架构搜索(NAS)的过程可用于自动评估大量这样的架构。 在这项研究中,还开发了一个集成开源工具进行神经架构搜索(OpenNAS)的系统,用于对图像进行分类。OpenNAS基于一系列元启发式方法生成灰度或RGB图像的卷积神经网络(CNN)架构。PSO和Ant Colony Optimization(ACO)作为SI算法使用。此外,通过这样的元启发式方法开发的模型可以使用堆叠增强集进行组合。 在本文中,我们关注使用OpenNAS中的Swarm Intelligence(SI)组件训练和优化CNN。本文比较了PSO和ACO这两种SI算法,以确定哪种算法在生成较高模型准确性方面更有效。实验结果表明,在我们的实验设计中,PSO算法表现更好。PSO的性能改善最为明显,尤其是在具有更复杂数据集的情况下。作为 baseline,还评估了预训练模型的性能。
https://arxiv.org/abs/2403.03781
Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adjacency matrix describing the graph structure of a neural network, novel encodings embrace a variety of approaches from unsupervised pretraining of latent representations to vectors of zero-cost proxies. In this paper, we categorize and investigate neural encodings from three main types: structural, learned, and score-based. Furthermore, we extend these encodings and introduce \textit{unified encodings}, that extend NAS predictors to multiple search spaces. Our analysis draws from experiments conducted on over 1.5 million neural network architectures on NAS spaces such as NASBench-101 (NB101), NB201, NB301, Network Design Spaces (NDS), and TransNASBench-101. Building on our study, we present our predictor \textbf{FLAN}: \textbf{Fl}ow \textbf{A}ttention for \textbf{N}AS. FLAN integrates critical insights on predictor design, transfer learning, and \textit{unified encodings} to enable more than an order of magnitude cost reduction for training NAS accuracy predictors. Our implementation and encodings for all neural networks are open-sourced at \href{this https URL}{this https URL\_nas}.
基于预测器的神经网络架构搜索(NAS)优化方法已经极大地增强了NAS。这些预测器的有效性很大程度上取决于编码神经网络架构的方法。虽然传统的编码方法使用邻接矩阵描述神经网络的图形结构,而新的编码方法则采用各种无监督预训练、零成本代理的方案,从神经网络的图结构编码到向量表示。在本文中,我们将分类并研究三种主要的神经编码:结构、学习到的和基于分数的。此外,我们将这些编码扩展到多个搜索空间,引入了统一编码,将NAS预测器扩展到多个搜索空间。我们的分析基于在NAS空间上超过1500万神经网络架构的实验,如NASBench-101(NB101)、NB201、NB301、网络设计空间(NDS)和TransNASBench-101。基于我们的研究,我们提出了预测器FLAN:FLow Attention for NAS。FLAN集成了关于预测器设计、迁移学习和统一编码的关键见解,以实现训练NAS准确度预测器超过一倍的成本降低。我们的实现和对所有神经网络的编码都是开源的,您可以点击以下链接访问:<https://this https URL>this https URL_nas。
https://arxiv.org/abs/2403.02484
Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a specific hardware device. Central to these search algorithms is a prediction model that is designed to provide a hardware latency estimate for a candidate NN architecture. Recent research has shown that the sample efficiency of these predictive models can be greatly improved through pre-training on some \textit{training} devices with many samples, and then transferring the predictor on the \textit{test} (target) device. Transfer learning and meta-learning methods have been used for this, but often exhibit significant performance variability. Additionally, the evaluation of existing latency predictors has been largely done on hand-crafted training/test device sets, making it difficult to ascertain design features that compose a robust and general latency predictor. To address these issues, we introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets. We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes. Building on conclusions from our study, we present an end-to-end latency predictor training strategy that outperforms existing methods on 11 out of 12 difficult latency prediction tasks, improving latency prediction by 22.5\% on average, and up to to 87.6\% on the hardest tasks. Focusing on latency prediction, our HW-Aware NAS reports a $5.8\times$ speedup in wall-clock time. Our code is available on \href{this https URL}{this https URL\_latency}.
高效部署神经网络(NN)需要准确性和延迟的协同优化。例如,通过在特定硬件设备上自动找到满足延迟约束的NN架构,硬件感知神经架构搜索方法已经被用于找到满足硬件设备上延迟约束的NN架构。这些搜索算法的核心是一个预测模型,它被设计为一个候选NN架构的硬件延迟估计。最近的研究表明,通过在某些具有大量样本的训练设备上进行预训练,然后将预测器转移到测试(目标)设备上,可以大大提高这些预测模型的样本效率。为了实现这一目标,我们通过基于原理的硬件设备集的自动划分引入了一个完整的延迟预测任务集。然后,我们设计了一个通用的延迟预测器,以全面研究(1)预测器架构,(2)NN样本选择方法,(3)硬件设备表示,(4)NN操作编码方案。基于我们研究的结果,我们提出了一个端到端的延迟预测器训练策略,在12个具有挑战性的延迟预测任务中表现优于现有方法,将平均延迟预测提高22.5%,而在最困难的任务上,性能提高至87.6%。关注延迟预测,我们的硬件感知NAS在墙时时间上报告了5.8倍的提速。我们的代码可在此处访问:<https://this https://this https://this URL>
https://arxiv.org/abs/2403.02446
Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online applications. In this paper, we investigate an efficient deep learning-based motion magnification model that runs in real time for full-HD resolution videos. Due to the specified network design of the prior art, i.e. inhomogeneous architecture, the direct application of existing neural architecture search methods is complicated. Instead of automatic search, we carefully investigate the architecture module by module for its role and importance in the motion magnification task. Two key findings are 1) Reducing the spatial resolution of the latent motion representation in the decoder provides a good trade-off between computational efficiency and task quality, and 2) surprisingly, only a single linear layer and a single branch in the encoder are sufficient for the motion magnification task. Based on these findings, we introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art while maintaining comparable quality.
视频运动放大是一种通过捕捉和放大肉眼看不见的视频中的微妙运动来捕获和放大的技术。基于深度学习的先驱工作在保持与传统信号处理方法相比具有卓越的质量的同时,成功建模了运动放大问题。然而,它仍然落后于实时性能,无法扩展到各种在线应用程序。在本文中,我们研究了一个在高清分辨率视频上运行的实时深度学习运动放大模型。由于先前的网络设计是异构的,因此直接应用现有的神经网络架构搜索方法会非常复杂。我们不是通过自动搜索,而是仔细研究每个模块的架构模块,以了解其在运动放大任务中的作用和重要性。两个关键发现是 1) 降低编码器中潜在运动表示的空间分辨率可以实现计算效率和任务质量之间的良好平衡,以及 2) 令人惊讶的是,仅需要一个线性层和一个分支的编码器就可以完成运动放大任务。基于这些发现,我们引入了一个具有4.2X fewer FLOPs 和比先前技术快2.7X 的实时深度学习运动放大模型,同时保持相当的质量。
https://arxiv.org/abs/2403.01898
As machine learning (ML) algorithms get deployed in an ever-increasing number of applications, these algorithms need to achieve better trade-offs between high accuracy, high throughput and low latency. This paper introduces NASH, a novel approach that applies neural architecture search to machine learning hardware. Using NASH, hardware designs can achieve not only high throughput and low latency but also superior accuracy performance. We present four versions of the NASH strategy in this paper, all of which show higher accuracy than the original models. The strategy can be applied to various convolutional neural networks, selecting specific model operations among many to guide the training process toward higher accuracy. Experimental results show that applying NASH on ResNet18 or ResNet34 achieves a top 1 accuracy increase of up to 3.1% and a top 5 accuracy increase of up to 2.2% compared to the non-NASH version when tested on the ImageNet data set. We also integrated this approach into the FINN hardware model synthesis tool to automate the application of our approach and the generation of the hardware model. Results show that using FINN can achieve a maximum throughput of 324.5 fps. In addition, NASH models can also result in a better trade-off between accuracy and hardware resource utilization. The accuracy-hardware (HW) Pareto curve shows that the models with the four NASH versions represent the best trade-offs achieving the highest accuracy for a given HW utilization. The code for our implementation is open-source and publicly available on GitHub at this https URL.
随着机器学习(ML)算法在越来越多的应用程序中得到部署,这些算法需要实现高准确度、高吞吐量和低延迟之间的良好平衡。本文介绍了一种新的方法NASH,将神经架构搜索应用于机器学习硬件。使用NASH,硬件设计可以实现不仅具有高吞吐量和高延迟,而且具有卓越的准确度性能。本文展示了四个NASH策略版本,所有版本都表现出比原始模型更高的准确度。策略可以应用于各种卷积神经网络,从许多模型操作中选择特定的模型操作来引导训练过程朝着更高的准确度。在ImageNet数据集上进行测试时,应用NASH在ResNet18或ResNet34上,与非NASH版本相比,前者的Top 1准确度提高了3.1%,Top 5准确度提高了2.2%。我们还将这种方法集成到FINN硬件模型合成工具中,以自动应用我们的策略并生成硬件模型。结果表明,使用FINN可以达到最大吞吐量为324.5 fps。此外,NASH模型还可以实现准确度和硬件资源利用率之间的更好平衡。准确性-硬件(HW)帕累托曲线显示,四个NASH版本所表示的模型具有最高准确度,对于给定的硬件利用率。我们的实现代码是开源的,在GitHub上公开可用,链接为https://github.com/。
https://arxiv.org/abs/2403.01845
Neural Architecture Search (NAS), aiming at automatically designing neural architectures by machines, has been considered a key step toward automatic machine learning. One notable NAS branch is the weight-sharing NAS, which significantly improves search efficiency and allows NAS algorithms to run on ordinary computers. Despite receiving high expectations, this category of methods suffers from low search effectiveness. By employing a generalization boundedness tool, we demonstrate that the devil behind this drawback is the untrustworthy architecture rating with the oversized search space of the possible architectures. Addressing this problem, we modularize a large search space into blocks with small search spaces and develop a family of models with the distilling neural architecture (DNA) techniques. These proposed models, namely a DNA family, are capable of resolving multiple dilemmas of the weight-sharing NAS, such as scalability, efficiency, and multi-modal compatibility. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using heuristic algorithms. Moreover, under a certain computational complexity constraint, our method can seek architectures with different depths and widths. Extensive experimental evaluations show that our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively. Additionally, we provide in-depth empirical analysis and insights into neural architecture ratings. Codes available: \url{this https URL}.
神经架构搜索(NAS)是一种通过机器自动设计神经架构以实现自动机器学习的关键步骤。一个著名的NAS分支是权重共享NAS,它显著提高了搜索效率,并允许NAS算法在普通计算机上运行。尽管受到了很高的期望,但这一类方法在搜索效果上仍然存在较低的问题。通过采用泛化边界工具,我们证明了这一缺点背后的恶魔是过大搜索空间中不信任的建筑评分。为解决这个问题,我们将大量搜索空间模块化成小搜索空间,并开发了一类使用蒸馏神经架构技术(DNA)的模型家族。这些提出的模型,即DNA家族,能够解决权重共享NAS中的多个困境,例如可扩展性、效率和多模态兼容性。与之前的 works 不同,我们的DNA模型可以评分所有架构候选者,而不仅仅是使用启发式算法的子搜索空间。此外,在某种计算复杂度约束下,我们的方法可以寻求具有不同深度和宽度的架构。大量实验评估结果表明,我们的模型在ImageNet上为移动卷积网络和小型视觉变压器分别实现了98.9%和83.6%的尖端准确率。此外,我们还提供了对神经架构评级的深入实证分析和见解。代码可访问:\url{this <https://this https URL>.
https://arxiv.org/abs/2403.01326
Spiking Neural Networks (SNN). SNNs are based on a more biologically inspired approach than usual artificial neural networks. Such models are characterized by complex dynamics between neurons and spikes. These are very sensitive to the hyperparameters, making their optimization challenging. To tackle hyperparameter optimization of SNNs, we initially extended the signal loss issue of SNNs to what we call silent networks. These networks fail to emit enough spikes at their outputs due to mistuned hyperparameters or architecture. Generally, search spaces are heavily restrained, sometimes even discretized, to prevent the sampling of such networks. By defining an early stopping criterion detecting silent networks and by designing specific constraints, we were able to instantiate larger and more flexible search spaces. We applied a constrained Bayesian optimization technique, which was asynchronously parallelized, as the evaluation time of a SNN is highly stochastic. Large-scale experiments were carried-out on a multi-GPU Petascale architecture. By leveraging silent networks, results show an acceleration of the search, while maintaining good performances of both the optimization algorithm and the best solution obtained. We were able to apply our methodology to two popular training algorithms, known as spike timing dependent plasticity and surrogate gradient. Early detection allowed us to prevent worthless and costly computation, directing the search toward promising hyperparameter combinations. Our methodology could be applied to multi-objective problems, where the spiking activity is often minimized to reduce the energy consumption. In this scenario, it becomes essential to find the delicate frontier between low-spiking and silent networks. Finally, our approach may have implications for neural architecture search, particularly in defining suitable spiking architectures.
尖峰神经网络(SNN)。SNN与通常的人工神经网络有所不同,其基于更生物启发的原理。这类模型的特点是神经元和尖峰之间的复杂动态。对SNN进行优化时,由于超参数设置不正确或网络架构不合适,这些网络在输出处产生的尖峰不足。一般来说,搜索空间都被严重限制,有时甚至被离散化,以防止采样这样的网络。通过定义一个早期停止准则来检测SNN并设计特定的约束,我们能够实现较大的、更灵活的搜索空间。我们对SNN应用了约束的贝叶斯优化技术,由于SNN的评估时间高度随机,因此评估时间对并行计算的依赖较大。我们在多GPU Petascale架构上进行了大规模实验。通过利用SNN,结果表明,搜索加速,同时保持优化算法和最佳解的性能。我们将我们的方法应用于两种流行的训练算法,即尖峰时序相关塑性和代理梯度。早期检测使我们能够防止无用且昂贵的计算,将搜索方向转向有前景的超参数组合。我们的方法可以应用于多目标问题,其中尖峰活动经常被最小化以降低能耗。在这种情况下,找到尖峰网络和沉默网络之间的微妙边界至关重要。最后,我们的方法对神经架构搜索可能产生影响,特别是定义合适的尖峰网络架构。
https://arxiv.org/abs/2403.00450
Neural Architecture Search (NAS) paves the way for the automatic definition of Neural Network (NN) architectures, attracting increasing research attention and offering solutions in various scenarios. This study introduces a novel NAS solution, called Flat Neural Architecture Search (FlatNAS), which explores the interplay between a novel figure of merit based on robustness to weight perturbations and single NN optimization with Sharpness-Aware Minimization (SAM). FlatNAS is the first work in the literature to systematically explore flat regions in the loss landscape of NNs in a NAS procedure, while jointly optimizing their performance on in-distribution data, their out-of-distribution (OOD) robustness, and constraining the number of parameters in their architecture. Differently from current studies primarily concentrating on OOD algorithms, FlatNAS successfully evaluates the impact of NN architectures on OOD robustness, a crucial aspect in real-world applications of machine and deep learning. FlatNAS achieves a good trade-off between performance, OOD generalization, and the number of parameters, by using only in-distribution data in the NAS exploration. The OOD robustness of the NAS-designed models is evaluated by focusing on robustness to input data corruptions, using popular benchmark datasets in the literature.
Neural Architecture Search (NAS) 为神经网络 (NN) 架构的自动定义铺平了道路,吸引了越来越多的研究关注,并为各种场景提供了解决方案。本研究介绍了一种新颖的 NAS 解决方案,称为平滑神经架构搜索 (FlatNAS),探讨了基于新颖的基于容错性的指标和基于Sharpness-Aware最小化 (SAM) 的单 NN 优化之间的相互作用。FlatNAS 是文献中第一个系统地探索 NN 损失函数平面上平局的解决方案,同时也在其分布数据上对其性能和离散数据 (OOD) 鲁棒性进行优化,并限制其架构中参数的数量。与当前研究主要集中于 OOD 算法的研究不同,FlatNAS 成功地评估了 NN 架构对 OOD 鲁棒性的影响,这是机器和深度学习在现实世界应用中至关重要的一个方面。通过仅使用分布数据进行 NAS 探索,FlatNAS 实现了性能、OO 泛化性和参数数量之间的良好平衡。 FlatNAS 对 NAS 设计的模型的 OOD 鲁棒性进行了评估,通过关注输入数据污染的鲁棒性,使用了文献中流行的基准数据集。
https://arxiv.org/abs/2402.19102
Building efficient neural network architectures can be a time-consuming task requiring extensive expert knowledge. This task becomes particularly challenging for edge devices because one has to consider parameters such as power consumption during inferencing, model size, inferencing speed, and CO2 emissions. In this article, we introduce a novel framework designed to automatically discover new neural network architectures based on user-defined parameters, an expert system, and an LLM trained on a large amount of open-domain knowledge. The introduced framework (LeMo-NADe) is tailored to be used by non-AI experts, does not require a predetermined neural architecture search space, and considers a large set of edge device-specific parameters. We implement and validate this proposed neural architecture discovery framework using CIFAR-10, CIFAR-100, and ImageNet16-120 datasets while using GPT-4 Turbo and Gemini as the LLM component. We observe that the proposed framework can rapidly (within hours) discover intricate neural network models that perform extremely well across a diverse set of application settings defined by the user.
建立高效的神经网络架构可能是一个耗时且需要广泛专家知识的任务。对于边缘设备来说,这个任务变得尤为具有挑战性,因为需要考虑诸如功耗、模型大小、推理速度和二氧化碳排放等参数。在本文中,我们介绍了一个新框架,该框架可以根据用户定义的参数、专家系统和基于大量开放领域知识的大型语言模型(LLM)自动发现新的神经网络架构。所提出的框架(LeMo-NADe)专门针对非AI专家设计,不需要预先确定的神经架构搜索空间,并考虑了一个大型的边缘设备特定参数集。我们使用CIFAR-10、CIFAR-100和ImageNet16-120数据集来实施和验证所提出的神经网络架构发现框架,同时使用GPT-4 Turbo和Gemini作为LLM组件。我们观察到,与现有的方法相比,所提出的框架可以在几小时内迅速发现用户定义的复杂神经网络模型,这些模型在各种应用场景中表现出色。
https://arxiv.org/abs/2402.18443
Pareto front profiling in multi-objective optimization (MOO), i.e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training. Typically, in MOO neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics, and yields representative and diverse architectures across multiple devices in just one search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that can be conditioned on hardware features and preference vectors, enabling zero-shot transferability to new devices. Extensive experiments with up to 19 hardware devices and 3 objectives showcase the effectiveness and scalability of our method. Finally, we show that, without additional costs, our method outperforms existing MOO NAS methods across qualitatively different search spaces and datasets, including MobileNetV3 on ImageNet-1k and a Transformer space on machine translation.
在多目标优化(MOO)中进行帕累托前沿分析(PFA),即寻找多样性的帕累托最优解决方案,是一个具有挑战性的任务,尤其是在具有昂贵目标(如神经网络训练)的情况下。通常,在MOO神经架构搜索(NAS)中,我们试图在设备之间平衡性能和硬件指标。以前NAS方法通过将硬件约束融入目标函数来简化这一任务,但PFA需要对每个约束进行搜索。在这项工作中,我们提出了一个新颖的NAS算法,它将用户对性能和硬件指标之间的权衡的偏好编码到用户偏好的联合架构分布中,并仅在一次搜索运行中生成具有代表性的多样性的架构。为此,我们通过一个可以条件化硬件特征和偏好向量的超网络,对设备之间的联合 architectural 分布进行参数化,实现零 shots 传输到新设备。我们对多达19个硬件设备和3个目标进行了广泛的实验,展示了我们方法的有效性和可扩展性。最后,我们证明了,在没有额外费用的情况下,我们的方法在定性不同的搜索空间和数据集上优于现有的MOO NAS方法,包括在ImageNet-1k上使用移动NetV3和机器翻译空间上使用Transformer。
https://arxiv.org/abs/2402.18213
It is critical to deploy complicated neural network models on hardware with limited resources. This paper proposes a novel model quantization method, named the Low-Cost Proxy-Based Adaptive Mixed-Precision Model Quantization (LCPAQ), which contains three key modules. The hardware-aware module is designed by considering the hardware limitations, while an adaptive mixed-precision quantization module is developed to evaluate the quantization sensitivity by using the Hessian matrix and Pareto frontier techniques. Integer linear programming is used to fine-tune the quantization across different layers. Then the low-cost proxy neural architecture search module efficiently explores the ideal quantization hyperparameters. Experiments on the ImageNet demonstrate that the proposed LCPAQ achieves comparable or superior quantization accuracy to existing mixed-precision models. Notably, LCPAQ achieves 1/200 of the search time compared with existing methods, which provides a shortcut in practical quantization use for resource-limited devices.
在资源有限的情况下部署复杂的神性网络模型至关重要。本文提出了一种名为低成本代理基于适应混合精度模型量化(LCPAQ)的新模型量化方法,包含三个关键模块。硬件感知模块考虑到硬件限制,而自适应混合精度量化模块通过使用Hessian矩阵和帕累托前沿技术来评估量化敏感性。使用整数线性规划在不同的层之间微调量化。然后,低成本代理神经架构搜索模块有效地探索理想的量化超参数。在ImageNet上的实验表明,与现有混合精度模型相比,LCPAQ具有可比较或更好的量化精度。值得注意的是,LCPAQ比现有方法减少了1/200的搜索时间,为资源受限设备提供了一个实用的量化使用捷径。
https://arxiv.org/abs/2402.17706
Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
发展健壮且可解释的视觉系统是实现可信人工智能的重要一步。在这方面,一个有前景的范式考虑将任务所需的不变结构(例如几何不变)嵌入基本图像表示中。然而,这样的不变表示通常表现出有限的判别能力,限制了其在大型可信视觉任务中的应用。针对这个问题,我们进行了系统性的研究,从理论、实践和应用角度探讨了层次不变性。在理论层面上,我们证明了通过类似于卷积神经网络(CNN)的层次结构构建自监督类全局不变量(GUV)且在完全可解释的方式下构建。提供了总体的描述、具体的定义、不变性质和数值实现。在实践层面上,我们讨论了如何将这个理论框架定制到给定的任务上。在层次不变性的情况下,可以以类似于神经架构搜索(NAS)的方式动态地形成与任务相关的判别特征。我们在纹理、数字和寄生虫分类实验中证明了上述论点的准确度、不变性和效率。此外,在应用层面上,我们的表示在现实世界的法医取证任务中研究了对抗扰动和人工智能生成内容(AIGC)。这些应用表明,与传统的CNN和不变量相比,所提出的策略不仅实现了理论上的承诺的不变性,而且在深度学习时代也表现出了竞争力的判别能力。对于大型可信视觉任务,层次不变表示可以被视为传统CNN和不变量的有效替代方案。
https://arxiv.org/abs/2402.15430
Miniaturized autonomous unmanned aerial vehicles (UAVs) are gaining popularity due to their small size, enabling new tasks such as indoor navigation or people monitoring. Nonetheless, their size and simple electronics pose severe challenges in implementing advanced onboard intelligence. This work proposes a new automatic optimization pipeline for visual pose estimation tasks using Deep Neural Networks (DNNs). The pipeline leverages two different Neural Architecture Search (NAS) algorithms to pursue a vast complexity-driven exploration in the DNNs' architectural space. The obtained networks are then deployed on an off-the-shelf nano-drone equipped with a parallel ultra-low power System-on-Chip leveraging a set of novel software kernels for the efficient fused execution of critical DNN layer sequences. Our results improve the state-of-the-art reducing inference latency by up to 3.22x at iso-error.
微型自主无人机(UAVs)因尺寸小巧而备受欢迎,可执行诸如室内导航或人员监测等新任务。然而,它们的尺寸和简单的电子元件在实现高级车载智能方面提出了严重挑战。本文提出了一种使用深度神经网络(DNN)进行视觉姿态估计任务的自动优化管道。该管道利用两种不同的神经架构搜索(NAS)算法,在DNN的架构空间中进行广泛的复杂性驱动探索。获得的网络随后部署在一台配备具有并行超低功耗系统级芯片的消费级纳米无人机上,利用一系列新的软件内核实现关键DNN层序列的高效融合执行。我们的结果将先进的推理延迟降低了至 ISO 错误次数的3.22倍。
https://arxiv.org/abs/2402.15273
Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design and yet not being able to reach the optimal architecture due to the uniqueness of each dataset. We overcome these challenges by proposing a novel multi-scale search space and a framework for Neural architecture search (NAS), which addresses both the problem of frequency and time resolution, discovering the suitable scale for a specific dataset. We further show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights. Our search space reaches the state-of-the-art performance on four datasets on four different domains while introducing more than ten highly fine-tuned models for each data.
大多数以前的时间序列分类方法(TSC)强调了感官领域和频率的重要性,而忽视了时间分辨率。因此,由于将广泛的感官领域集成到分类模型中,导致规模问题。其他方法,虽然在大型数据集上表现更好,但由于每个数据集的独特性,需要手动设计,并且无法达到最优架构。我们通过提出一种新颖的多尺度搜索空间和神经架构搜索(NAS)框架来解决这些问题,该框架解决了频率和时间分辨率的问题,并发现了适合特定数据集的合适尺度。我们还进一步展示了我们的模型可以作为使用强大Transformer模块的强大后盾,无论是否使用预训练权重。我们的搜索空间在四个不同的数据集和四个领域上达到了最先进的性能,同时为每个数据集添加了超过10个高度微调的模型。
https://arxiv.org/abs/2402.13822
A significant hurdle in the noisy intermediate-scale quantum (NISQ) era is identifying functional quantum circuits. These circuits must also adhere to the constraints imposed by current quantum hardware limitations. Variational quantum algorithms (VQAs), a class of quantum-classical optimization algorithms, were developed to address these challenges in the currently available quantum devices. However, the overall performance of VQAs depends on the initialization strategy of the variational circuit, the structure of the circuit (also known as ansatz), and the configuration of the cost function. Focusing on the structure of the circuit, in this thesis, we improve the performance of VQAs by automating the search for an optimal structure for the variational circuits using reinforcement learning (RL). Within the thesis, the optimality of a circuit is determined by evaluating its depth, the overall count of gates and parameters, and its accuracy in solving the given problem. The task of automating the search for optimal quantum circuits is known as quantum architecture search (QAS). The majority of research in QAS is primarily focused on a noiseless scenario. Yet, the impact of noise on the QAS remains inadequately explored. In this thesis, we tackle the issue by introducing a tensor-based quantum circuit encoding, restrictions on environment dynamics to explore the search space of possible circuits efficiently, an episode halting scheme to steer the agent to find shorter circuits, a double deep Q-network (DDQN) with an $\epsilon$-greedy policy for better stability. The numerical experiments on noiseless and noisy quantum hardware show that in dealing with various VQAs, our RL-based QAS outperforms existing QAS. Meanwhile, the methods we propose in the thesis can be readily adapted to address a wide range of other VQAs.
在嘈杂的中规模量子(NISQ)时代,一个重大的挑战是确定功能量子电路。这些电路还必须遵守当前量子硬件的限制。为了应对这些挑战,已经开发了一种类量子经典优化算法——变分量子算法(VQAs)。然而,VQAs的总体性能取决于变分电路的初始化策略、电路的结构(也称为解法)和成本函数的配置。 在本文中,我们通过使用强化学习(RL)自动搜索最优电路结构来提高VQAs的性能。在本论文中,我们通过提高VQAs的搜索深度、总门电路数和解决给定问题的准确性来评估优化电路的优劣。 量子架构搜索(QAS)是自动搜索最优量子电路的任务。然而,QAS研究的主要集中在无噪声场景。然而,对QAS中噪声对搜索空间的影响尚缺乏充分的探讨。 本文我们通过引入基于张量的量子电路编码、环境动态限制,探索可能的电路搜索空间,实现一个 episode halting scheme,以及一个双深 Q-网络(DDQN)和一个具有 $\epsilon$-greedy 策略的量子环网络,来解决这一问题。 在无噪声和有噪声的量子硬件上进行数值实验证明,基于强化学习的 QAS在处理各种VQAs方面优于现有QAS。与此同时,我们提出的 thesis 中所提出的方法可以很容易地适应解决各种其他 VQAs。
https://arxiv.org/abs/2402.13754
With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may not be ideally matched to any neural network's attributes such as tensor dimension and arithmetic intensity, thus leading to suboptimal and under-performing systems. Despite the success of neural architecture search (NAS) techniques in yielding efficient sub-networks for a given hardware metric budget (e.g., DNN execution time or latency), it assumes the hardware configuration to be frozen, often yielding sub-optimal sub-networks for a given budget. In this paper, we present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for CiM architectures creating a Pareto optimal frontier of downstream task accuracy and execution metrics (e.g., latency). The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices including bandwidth, processing element size, and memory size. Exhaustive experiments on different model architectures from both CNN and Transformer families demonstrate the efficacy of the CiMNet in finding co-optimized sub-networks and CiM hardware configurations. Specifically, for similar ImageNet classification accuracy as baseline ViT-B, optimizing only the model architecture increases performance (or reduces workload execution time) by 1.7x while optimizing for both the model architecture and hardware configuration increases it by 3.1x.
随着大规模深度神经网络需求的最近增长,计算内存(CiM)作为一种解决带宽和芯片间互连瓶颈的有希望的解决方案,受到了关注。然而,CiM硬件构建面临着挑战,因为不同接口上的具体内存层次结构可能与神经网络的属性(如张量维度和算术强度)理想上不匹配,从而导致亚优和低效的系统。尽管神经架构搜索(NAS)技术在为给定硬件指标预算产生高效的子网络方面取得了成功(例如,DNN执行时间或延迟),但它假设硬件配置为固定,往往导致为给定预算产生亚优的子网络。在本文中,我们提出了CiMNet,一个用于共同搜索最优子网络和硬件配置的CiM架构的框架,为创建帕累托最优前沿的下游任务准确性和执行指标(例如延迟)奠定了基础。对CNN和Transformer家族中的不同模型架构进行充分的实验证明,CiMNet在找到共同优化的子网络和CiM硬件配置方面非常有效。具体来说,与基线ViT-B相同的ImageNet分类准确性只优化模型架构可以增加性能(或减少工作负载执行时间)1.7倍,同时优化模型架构和硬件配置可以增加其3.1倍。
https://arxiv.org/abs/2402.11780