Image segmentation is one of the most fundamental problems in computer vision and has drawn a lot of attentions due to its vast applications in image understanding and autonomous driving. However, designing effective and efficient segmentation neural architectures is a labor-intensive process that may require lots of trials by human experts. In this paper, we address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently, by leveraging architecture search. Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution. By contrast, we develop a multi-target multi-branch supernet method, which not only fully utilizes the advantages of high-resolution features, but also finds the proper location for placing multi-head self-attention module. Our search algorithm is optimized towards multiple objective s (e.g., latency and mIoU) and capable of finding architectures on Pareto frontier with arbitrary number of branches in a single search. We further present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers between branches from different resolutions and fuse to high resolution for both efficiency and effectiveness. Extensive experiments demonstrate that HyCTAS outperforms previous methods on semantic segmentation task. Code and models are available at \url{this https URL}.
图像分割是计算机视觉中最基本的问题之一,由于其在图像理解和自动驾驶中的广泛应用,因此受到了很多关注。然而,设计有效且高效的分割神经架构是一个劳动密集的过程,可能需要许多专家的人工作尝试。在本文中,我们通过利用架构搜索解决了将多头自注意力集成到高分辨率表示CNNs中的问题,通过构建多目标多分支超级网络。通过手动替换卷积层为多头自注意力,由于需要高昂的内存开销来维持高分辨率,因此这是不可能的。相反,我们开发了一种多目标多分支超级网络方法,不仅充分利用了高分辨率特征的优势,而且发现了放置多头自注意力的适当位置。我们的搜索算法针对多个目标(如延迟和mIoU)进行优化,可以在一个搜索中找到架构在帕累托前沿的任意数量分支上的最优架构。我们还通过HyCTAS方法展示了一系列模型,该方法在寻找不同分辨率分支的最佳轻量级卷积层和内存高效的自注意力层之间进行了搜索,将高分辨率与效率进行了平衡。大量实验证明,HyCTAS在语义分割任务上优于以前的算法。代码和模型可在此处访问:\url{这个链接}。
https://arxiv.org/abs/2403.10413
Benchmarking plays a pivotal role in assessing and enhancing the performance of compact deep learning models designed for execution on resource-constrained devices, such as microcontrollers. Our study introduces a novel, entirely artificially generated benchmarking dataset tailored for speech recognition, representing a core challenge in the field of tiny deep learning. SpokeN-100 consists of spoken numbers from 0 to 99 spoken by 32 different speakers in four different languages, namely English, Mandarin, German and French, resulting in 12,800 audio samples. We determine auditory features and use UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) as a dimensionality reduction method to show the diversity and richness of the dataset. To highlight the use case of the dataset, we introduce two benchmark tasks: given an audio sample, classify (i) the used language and/or (ii) the spoken number. We optimized state-of-the-art deep neural networks and performed an evolutionary neural architecture search to find tiny architectures optimized for the 32-bit ARM Cortex-M4 nRF52840 microcontroller. Our results represent the first benchmark data achieved for SpokeN-100.
基准测试在评估和增强资源受限设备上设计的紧凑型深度学习模型的性能中发挥着重要作用,例如微控制器。我们的研究介绍了一个新的、完全人工生成的基准测试数据集,专门针对语音识别,代表了该领域中最小的深度学习挑战。SpokeN-100 包括来自 0 到 99 的语音数字,由 32 名不同的说话者用英语、普通话、德语和法语讲述了,共产生 12,800 个音频样本。我们确定音频特征,并使用 UMAP(统一曼哈顿近似和投影降维)作为降维方法,以展示数据集的多样性和丰富性。为了突出该数据集的使用案例,我们引入了两个基准任务:给定一个音频样本,分类(i)使用的语言,(ii)说话的数字。我们优化了最先进的深度神经网络,并进行了进化神经架构搜索,以找到针对 32 位 ARM Cortex-M4 nRF52840 微控制器的最佳架构。我们的结果代表了 SpokeN-100 第一个基准数据。
https://arxiv.org/abs/2403.09753
Sparse neural networks have shown similar or better generalization performance than their dense counterparts while having higher parameter efficiency. This has motivated a number of works to learn, induce, or search for high performing sparse networks. While reports of quality or efficiency gains are impressive, standard baselines are lacking, therefore hindering having reliable comparability and reproducibility across methods. In this work, we provide an evaluation approach and a naive Random Search baseline method for finding good sparse configurations. We apply Random Search on the node space of an overparameterized network with the goal of finding better initialized sparse sub-networks that are positioned more advantageously in the loss landscape. We record sparse network post-training performances at various levels of sparsity and compare against both their fully connected parent networks and random sparse configurations at the same sparsity levels. We observe that for this architecture search task, initialized sparse networks found by Random Search neither perform better nor converge more efficiently than their random counterparts. Thus we conclude that Random Search may be viewed as a suitable neutral baseline for sparsity search methods.
稀疏神经网络在具有较高参数效率的同时显示出与密集神经网络相似或更好的泛化性能。这促使了许多工作学习、诱导或搜索高性能稀疏网络。尽管提高质量或效率的报告令人印象深刻,但标准基线却是匮乏的,因此阻碍了在方法间进行可靠比较和重复。在本文中,我们提供了评估方法和一个简单的随机搜索基线方法,用于寻找好的稀疏配置。我们在过参数网络的节点空间上应用随机搜索,旨在找到更优的初始稀疏子网络,使其在损失函数地形上更具有优势。我们记录了各种稀疏网络的训练后表现,并将它们与同一稀疏水平上的完全连接 parent 网络和随机稀疏配置进行比较。我们观察到,对于这种架构搜索任务,由 Random Search 找到的初始稀疏网络既没有表现得更好,也没有更有效地收敛。因此,我们得出结论,随机搜索可以被视为一种合适的稀疏搜索方法的合适中性基线。
https://arxiv.org/abs/2403.08265
Neural architecture search automates the design of neural network architectures usually by exploring a large and thus complex architecture search space. To advance the architecture search, we present a graph diffusion-based NAS approach that uses discrete conditional graph diffusion processes to generate high-performing neural network architectures. We then propose a multi-conditioned classifier-free guidance approach applied to graph diffusion networks to jointly impose constraints such as high accuracy and low hardware latency. Unlike the related work, our method is completely differentiable and requires only a single model training. In our evaluations, we show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed, i.e. less than 0.2 seconds per architecture. Furthermore, we demonstrate the generalisability and efficiency of our method through experiments on ImageNet dataset.
神经架构搜索通过探索大型且因此复杂的架构搜索空间,自动设计神经网络架构。为了提高架构搜索,我们提出了一个基于离散条件图扩散过程的NAS方法,该方法使用离散条件图扩散过程生成高性能的神经网络架构。然后,我们提出了一种应用于图扩散网络的多条件分类器-无关指导方法,以共同约束高准确性和低硬件延迟。与相关研究不同,我们的方法完全不同可导,只需要一个模型训练。在我们的评估中,我们在六个标准基准上展示了良好的结果,从而在不到0.2秒的时间内产生了新的和独特的架构。此外,我们通过在ImageNet数据集上的实验证明了我们的方法的泛化能力和效率。
https://arxiv.org/abs/2403.06020
We present ECToNAS, a cost-efficient evolutionary cross-topology neural architecture search algorithm that does not require any pre-trained meta controllers. Our framework is able to select suitable network architectures for different tasks and hyperparameter settings, independently performing cross-topology optimisation where required. It is a hybrid approach that fuses training and topology optimisation together into one lightweight, resource-friendly process. We demonstrate the validity and power of this approach with six standard data sets (CIFAR-10, CIFAR-100, EuroSAT, Fashion MNIST, MNIST, SVHN), showcasing the algorithm's ability to not only optimise the topology within an architectural type, but also to dynamically add and remove convolutional cells when and where required, thus crossing boundaries between different network types. This enables researchers without a background in machine learning to make use of appropriate model types and topologies and to apply machine learning methods in their domains, with a computationally cheap, easy-to-use cross-topology neural architecture search framework that fully encapsulates the topology optimisation within the training process.
我们提出了ECToNAS,一种成本效益高的进化交叉 topology 神经架构搜索算法,它不需要任何预训练的元控制器。我们的框架能够独立执行跨 topology 优化,在需要时提供。这是一种结合训练和 topology 优化的轻量级、资源友好的过程。我们用六个标准数据集(CIFAR-10,CIFAR-100,EuroSAT,Fashion MNIST,MNIST,SVHN)证明了这种方法的有效性和威力,展示了算法不仅可以在架构类型的 topology 上进行优化,还可以在需要时动态添加和删除卷积细胞,从而跨越不同网络类型的边界。这使得没有机器学习背景的研究人员可以使用适当的模型类型和 topology,并在其领域应用机器学习方法。我们提供了一个计算上便宜、易于使用的跨 topology 神经架构搜索框架,完全封装了 topology 优化在训练过程中。
https://arxiv.org/abs/2403.05123
The existing graph neural architecture search (GNAS) methods heavily rely on supervised labels during the search process, failing to handle ubiquitous scenarios where supervisions are not available. In this paper, we study the problem of unsupervised graph neural architecture search, which remains unexplored in the literature. The key problem is to discover the latent graph factors that drive the formation of graph data as well as the underlying relations between the factors and the optimal neural architectures. Handling this problem is challenging given that the latent graph factors together with architectures are highly entangled due to the nature of the graph and the complexity of the neural architecture search process. To address the challenge, we propose a novel Disentangled Self-supervised Graph Neural Architecture Search (DSGAS) model, which is able to discover the optimal architectures capturing various latent graph factors in a self-supervised fashion based on unlabeled graph data. Specifically, we first design a disentangled graph super-network capable of incorporating multiple architectures with factor-wise disentanglement, which are optimized simultaneously. Then, we estimate the performance of architectures under different factors by our proposed self-supervised training with joint architecture-graph disentanglement. Finally, we propose a contrastive search with architecture augmentations to discover architectures with factor-specific expertise. Extensive experiments on 11 real-world datasets demonstrate that the proposed model is able to achieve state-of-the-art performance against several baseline methods in an unsupervised manner.
现有的图神经网络架构搜索(GNAS)方法在搜索过程中严重依赖有监督标签,在无法获得监督时无法处理普遍场景。在本文中,我们研究了无监督图神经网络架构搜索问题,该问题在文献中尚未被深入研究。关键问题是要发现推动图数据形成的关键隐式图因素以及因素之间的底层关系。由于图的性质和神经网络架构搜索过程的复杂性,解决这个问题具有挑战性。为了解决这个问题,我们提出了一个新颖的剥离式自监督图神经网络架构搜索(DSGAS)模型,它能够以自监督的方式基于无监督图数据发现最优架构,捕捉各种隐式图因素。具体来说,我们首先设计了一个剥离的图超网络,具有多个架构因子水平的剥离,这些因子同时优化。然后,我们通过自监督训练与架构-图剥离估计架构的性能。最后,我们提出了一个对比性搜索与架构增强来发现具有特定专业知识因素的架构。在11个真实世界数据集上进行广泛的实验证明,与几个基线方法相比,该模型可以在无需监督的情况下实现最先进的性能。
https://arxiv.org/abs/2403.05064
Training-free metrics (a.k.a. zero-cost proxies) are widely used to avoid resource-intensive neural network training, especially in Neural Architecture Search (NAS). Recent studies show that existing training-free metrics have several limitations, such as limited correlation and poor generalisation across different search spaces and tasks. Hence, we propose Sample-Wise Activation Patterns and its derivative, SWAP-Score, a novel high-performance training-free metric. It measures the expressivity of networks over a batch of input samples. The SWAP-Score is strongly correlated with ground-truth performance across various search spaces and tasks, outperforming 15 existing training-free metrics on NAS-Bench-101/201/301 and TransNAS-Bench-101. The SWAP-Score can be further enhanced by regularisation, which leads to even higher correlations in cell-based search space and enables model size control during the search. For example, Spearman's rank correlation coefficient between regularised SWAP-Score and CIFAR-100 validation accuracies on NAS-Bench-201 networks is 0.90, significantly higher than 0.80 from the second-best metric, NWOT. When integrated with an evolutionary algorithm for NAS, our SWAP-NAS achieves competitive performance on CIFAR-10 and ImageNet in approximately 6 minutes and 9 minutes of GPU time respectively.
无训练指标(也称为零成本代理)广泛用于避免资源密集型神经网络训练,特别是在神经架构搜索(NAS)中。最近的研究表明,现有的训练指标具有多个局限性,如在不同搜索空间和任务上的相关性有限和泛化性能差。因此,我们提出了样本加权激活模式及其导数(SWAP-Score),一种新颖的高性能训练指标。它衡量网络在批输入样本上的表现力。SWAP-Score在各种搜索空间和任务上的地面真值性能上高度相关,在NAS-Bench-101/201/301和TransNAS-Bench-101上优于15个现有训练指标。通过正则化可以进一步增强SWAP-Score,从而在基于细胞的搜索空间中实现更高的相关性,并在搜索过程中实现模型大小的控制。例如,在NAS-Bench-201网络上的正常斯皮尔曼秩相关系数 between 经过正则化的SWAP-Score和CIFAR-100验证准确率之间的比值约为0.90,比第二好的指标NWOT高出约0.80。当与进化算法集成时,我们的SWAP-NAS在分别大约6分钟和9分钟的GPU时间内在CIFAR-10和ImageNet上实现竞争力的性能。
https://arxiv.org/abs/2403.04161
Neural network models have a number of hyperparameters that must be chosen along with their architecture. This can be a heavy burden on a novice user, choosing which architecture and what values to assign to parameters. In most cases, default hyperparameters and architectures are used. Significant improvements to model accuracy can be achieved through the evaluation of multiple architectures. A process known as Neural Architecture Search (NAS) may be applied to automatically evaluate a large number of such architectures. A system integrating open source tools for Neural Architecture Search (OpenNAS), in the classification of images, has been developed as part of this research. OpenNAS takes any dataset of grayscale, or RBG images, and generates Convolutional Neural Network (CNN) architectures based on a range of metaheuristics using either an AutoKeras, a transfer learning or a Swarm Intelligence (SI) approach. Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) are used as the SI algorithms. Furthermore, models developed through such metaheuristics may be combined using stacking ensembles. In the context of this paper, we focus on training and optimizing CNNs using the Swarm Intelligence (SI) components of OpenNAS. Two major types of SI algorithms, namely PSO and ACO, are compared to see which is more effective in generating higher model accuracies. It is shown, with our experimental design, that the PSO algorithm performs better than ACO. The performance improvement of PSO is most notable with a more complex dataset. As a baseline, the performance of fine-tuned pre-trained models is also evaluated.
神经网络模型有许多超参数需要选择,并与其架构一起选择。这可能对新手用户来说是一个沉重的负担,因为他们必须选择架构和为参数分配值。在大多数情况下,使用默认的超参数和架构即可。通过评估多种架构,可以实现模型的显著提高准确性。一种名为神经架构搜索(NAS)的过程可用于自动评估大量这样的架构。 在这项研究中,还开发了一个集成开源工具进行神经架构搜索(OpenNAS)的系统,用于对图像进行分类。OpenNAS基于一系列元启发式方法生成灰度或RGB图像的卷积神经网络(CNN)架构。PSO和Ant Colony Optimization(ACO)作为SI算法使用。此外,通过这样的元启发式方法开发的模型可以使用堆叠增强集进行组合。 在本文中,我们关注使用OpenNAS中的Swarm Intelligence(SI)组件训练和优化CNN。本文比较了PSO和ACO这两种SI算法,以确定哪种算法在生成较高模型准确性方面更有效。实验结果表明,在我们的实验设计中,PSO算法表现更好。PSO的性能改善最为明显,尤其是在具有更复杂数据集的情况下。作为 baseline,还评估了预训练模型的性能。
https://arxiv.org/abs/2403.03781
Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adjacency matrix describing the graph structure of a neural network, novel encodings embrace a variety of approaches from unsupervised pretraining of latent representations to vectors of zero-cost proxies. In this paper, we categorize and investigate neural encodings from three main types: structural, learned, and score-based. Furthermore, we extend these encodings and introduce \textit{unified encodings}, that extend NAS predictors to multiple search spaces. Our analysis draws from experiments conducted on over 1.5 million neural network architectures on NAS spaces such as NASBench-101 (NB101), NB201, NB301, Network Design Spaces (NDS), and TransNASBench-101. Building on our study, we present our predictor \textbf{FLAN}: \textbf{Fl}ow \textbf{A}ttention for \textbf{N}AS. FLAN integrates critical insights on predictor design, transfer learning, and \textit{unified encodings} to enable more than an order of magnitude cost reduction for training NAS accuracy predictors. Our implementation and encodings for all neural networks are open-sourced at \href{this https URL}{this https URL\_nas}.
基于预测器的神经网络架构搜索(NAS)优化方法已经极大地增强了NAS。这些预测器的有效性很大程度上取决于编码神经网络架构的方法。虽然传统的编码方法使用邻接矩阵描述神经网络的图形结构,而新的编码方法则采用各种无监督预训练、零成本代理的方案,从神经网络的图结构编码到向量表示。在本文中,我们将分类并研究三种主要的神经编码:结构、学习到的和基于分数的。此外,我们将这些编码扩展到多个搜索空间,引入了统一编码,将NAS预测器扩展到多个搜索空间。我们的分析基于在NAS空间上超过1500万神经网络架构的实验,如NASBench-101(NB101)、NB201、NB301、网络设计空间(NDS)和TransNASBench-101。基于我们的研究,我们提出了预测器FLAN:FLow Attention for NAS。FLAN集成了关于预测器设计、迁移学习和统一编码的关键见解,以实现训练NAS准确度预测器超过一倍的成本降低。我们的实现和对所有神经网络的编码都是开源的,您可以点击以下链接访问:<https://this https URL>this https URL_nas。
https://arxiv.org/abs/2403.02484
Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a specific hardware device. Central to these search algorithms is a prediction model that is designed to provide a hardware latency estimate for a candidate NN architecture. Recent research has shown that the sample efficiency of these predictive models can be greatly improved through pre-training on some \textit{training} devices with many samples, and then transferring the predictor on the \textit{test} (target) device. Transfer learning and meta-learning methods have been used for this, but often exhibit significant performance variability. Additionally, the evaluation of existing latency predictors has been largely done on hand-crafted training/test device sets, making it difficult to ascertain design features that compose a robust and general latency predictor. To address these issues, we introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets. We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes. Building on conclusions from our study, we present an end-to-end latency predictor training strategy that outperforms existing methods on 11 out of 12 difficult latency prediction tasks, improving latency prediction by 22.5\% on average, and up to to 87.6\% on the hardest tasks. Focusing on latency prediction, our HW-Aware NAS reports a $5.8\times$ speedup in wall-clock time. Our code is available on \href{this https URL}{this https URL\_latency}.
高效部署神经网络(NN)需要准确性和延迟的协同优化。例如,通过在特定硬件设备上自动找到满足延迟约束的NN架构,硬件感知神经架构搜索方法已经被用于找到满足硬件设备上延迟约束的NN架构。这些搜索算法的核心是一个预测模型,它被设计为一个候选NN架构的硬件延迟估计。最近的研究表明,通过在某些具有大量样本的训练设备上进行预训练,然后将预测器转移到测试(目标)设备上,可以大大提高这些预测模型的样本效率。为了实现这一目标,我们通过基于原理的硬件设备集的自动划分引入了一个完整的延迟预测任务集。然后,我们设计了一个通用的延迟预测器,以全面研究(1)预测器架构,(2)NN样本选择方法,(3)硬件设备表示,(4)NN操作编码方案。基于我们研究的结果,我们提出了一个端到端的延迟预测器训练策略,在12个具有挑战性的延迟预测任务中表现优于现有方法,将平均延迟预测提高22.5%,而在最困难的任务上,性能提高至87.6%。关注延迟预测,我们的硬件感知NAS在墙时时间上报告了5.8倍的提速。我们的代码可在此处访问:<https://this https://this https://this URL>
https://arxiv.org/abs/2403.02446
Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online applications. In this paper, we investigate an efficient deep learning-based motion magnification model that runs in real time for full-HD resolution videos. Due to the specified network design of the prior art, i.e. inhomogeneous architecture, the direct application of existing neural architecture search methods is complicated. Instead of automatic search, we carefully investigate the architecture module by module for its role and importance in the motion magnification task. Two key findings are 1) Reducing the spatial resolution of the latent motion representation in the decoder provides a good trade-off between computational efficiency and task quality, and 2) surprisingly, only a single linear layer and a single branch in the encoder are sufficient for the motion magnification task. Based on these findings, we introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art while maintaining comparable quality.
视频运动放大是一种通过捕捉和放大肉眼看不见的视频中的微妙运动来捕获和放大的技术。基于深度学习的先驱工作在保持与传统信号处理方法相比具有卓越的质量的同时,成功建模了运动放大问题。然而,它仍然落后于实时性能,无法扩展到各种在线应用程序。在本文中,我们研究了一个在高清分辨率视频上运行的实时深度学习运动放大模型。由于先前的网络设计是异构的,因此直接应用现有的神经网络架构搜索方法会非常复杂。我们不是通过自动搜索,而是仔细研究每个模块的架构模块,以了解其在运动放大任务中的作用和重要性。两个关键发现是 1) 降低编码器中潜在运动表示的空间分辨率可以实现计算效率和任务质量之间的良好平衡,以及 2) 令人惊讶的是,仅需要一个线性层和一个分支的编码器就可以完成运动放大任务。基于这些发现,我们引入了一个具有4.2X fewer FLOPs 和比先前技术快2.7X 的实时深度学习运动放大模型,同时保持相当的质量。
https://arxiv.org/abs/2403.01898
As machine learning (ML) algorithms get deployed in an ever-increasing number of applications, these algorithms need to achieve better trade-offs between high accuracy, high throughput and low latency. This paper introduces NASH, a novel approach that applies neural architecture search to machine learning hardware. Using NASH, hardware designs can achieve not only high throughput and low latency but also superior accuracy performance. We present four versions of the NASH strategy in this paper, all of which show higher accuracy than the original models. The strategy can be applied to various convolutional neural networks, selecting specific model operations among many to guide the training process toward higher accuracy. Experimental results show that applying NASH on ResNet18 or ResNet34 achieves a top 1 accuracy increase of up to 3.1% and a top 5 accuracy increase of up to 2.2% compared to the non-NASH version when tested on the ImageNet data set. We also integrated this approach into the FINN hardware model synthesis tool to automate the application of our approach and the generation of the hardware model. Results show that using FINN can achieve a maximum throughput of 324.5 fps. In addition, NASH models can also result in a better trade-off between accuracy and hardware resource utilization. The accuracy-hardware (HW) Pareto curve shows that the models with the four NASH versions represent the best trade-offs achieving the highest accuracy for a given HW utilization. The code for our implementation is open-source and publicly available on GitHub at this https URL.
随着机器学习(ML)算法在越来越多的应用程序中得到部署,这些算法需要实现高准确度、高吞吐量和低延迟之间的良好平衡。本文介绍了一种新的方法NASH,将神经架构搜索应用于机器学习硬件。使用NASH,硬件设计可以实现不仅具有高吞吐量和高延迟,而且具有卓越的准确度性能。本文展示了四个NASH策略版本,所有版本都表现出比原始模型更高的准确度。策略可以应用于各种卷积神经网络,从许多模型操作中选择特定的模型操作来引导训练过程朝着更高的准确度。在ImageNet数据集上进行测试时,应用NASH在ResNet18或ResNet34上,与非NASH版本相比,前者的Top 1准确度提高了3.1%,Top 5准确度提高了2.2%。我们还将这种方法集成到FINN硬件模型合成工具中,以自动应用我们的策略并生成硬件模型。结果表明,使用FINN可以达到最大吞吐量为324.5 fps。此外,NASH模型还可以实现准确度和硬件资源利用率之间的更好平衡。准确性-硬件(HW)帕累托曲线显示,四个NASH版本所表示的模型具有最高准确度,对于给定的硬件利用率。我们的实现代码是开源的,在GitHub上公开可用,链接为https://github.com/。
https://arxiv.org/abs/2403.01845
Neural Architecture Search (NAS), aiming at automatically designing neural architectures by machines, has been considered a key step toward automatic machine learning. One notable NAS branch is the weight-sharing NAS, which significantly improves search efficiency and allows NAS algorithms to run on ordinary computers. Despite receiving high expectations, this category of methods suffers from low search effectiveness. By employing a generalization boundedness tool, we demonstrate that the devil behind this drawback is the untrustworthy architecture rating with the oversized search space of the possible architectures. Addressing this problem, we modularize a large search space into blocks with small search spaces and develop a family of models with the distilling neural architecture (DNA) techniques. These proposed models, namely a DNA family, are capable of resolving multiple dilemmas of the weight-sharing NAS, such as scalability, efficiency, and multi-modal compatibility. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using heuristic algorithms. Moreover, under a certain computational complexity constraint, our method can seek architectures with different depths and widths. Extensive experimental evaluations show that our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively. Additionally, we provide in-depth empirical analysis and insights into neural architecture ratings. Codes available: \url{this https URL}.
神经架构搜索(NAS)是一种通过机器自动设计神经架构以实现自动机器学习的关键步骤。一个著名的NAS分支是权重共享NAS,它显著提高了搜索效率,并允许NAS算法在普通计算机上运行。尽管受到了很高的期望,但这一类方法在搜索效果上仍然存在较低的问题。通过采用泛化边界工具,我们证明了这一缺点背后的恶魔是过大搜索空间中不信任的建筑评分。为解决这个问题,我们将大量搜索空间模块化成小搜索空间,并开发了一类使用蒸馏神经架构技术(DNA)的模型家族。这些提出的模型,即DNA家族,能够解决权重共享NAS中的多个困境,例如可扩展性、效率和多模态兼容性。与之前的 works 不同,我们的DNA模型可以评分所有架构候选者,而不仅仅是使用启发式算法的子搜索空间。此外,在某种计算复杂度约束下,我们的方法可以寻求具有不同深度和宽度的架构。大量实验评估结果表明,我们的模型在ImageNet上为移动卷积网络和小型视觉变压器分别实现了98.9%和83.6%的尖端准确率。此外,我们还提供了对神经架构评级的深入实证分析和见解。代码可访问:\url{this <https://this https URL>.
https://arxiv.org/abs/2403.01326
Spiking Neural Networks (SNN). SNNs are based on a more biologically inspired approach than usual artificial neural networks. Such models are characterized by complex dynamics between neurons and spikes. These are very sensitive to the hyperparameters, making their optimization challenging. To tackle hyperparameter optimization of SNNs, we initially extended the signal loss issue of SNNs to what we call silent networks. These networks fail to emit enough spikes at their outputs due to mistuned hyperparameters or architecture. Generally, search spaces are heavily restrained, sometimes even discretized, to prevent the sampling of such networks. By defining an early stopping criterion detecting silent networks and by designing specific constraints, we were able to instantiate larger and more flexible search spaces. We applied a constrained Bayesian optimization technique, which was asynchronously parallelized, as the evaluation time of a SNN is highly stochastic. Large-scale experiments were carried-out on a multi-GPU Petascale architecture. By leveraging silent networks, results show an acceleration of the search, while maintaining good performances of both the optimization algorithm and the best solution obtained. We were able to apply our methodology to two popular training algorithms, known as spike timing dependent plasticity and surrogate gradient. Early detection allowed us to prevent worthless and costly computation, directing the search toward promising hyperparameter combinations. Our methodology could be applied to multi-objective problems, where the spiking activity is often minimized to reduce the energy consumption. In this scenario, it becomes essential to find the delicate frontier between low-spiking and silent networks. Finally, our approach may have implications for neural architecture search, particularly in defining suitable spiking architectures.
尖峰神经网络(SNN)。SNN与通常的人工神经网络有所不同,其基于更生物启发的原理。这类模型的特点是神经元和尖峰之间的复杂动态。对SNN进行优化时,由于超参数设置不正确或网络架构不合适,这些网络在输出处产生的尖峰不足。一般来说,搜索空间都被严重限制,有时甚至被离散化,以防止采样这样的网络。通过定义一个早期停止准则来检测SNN并设计特定的约束,我们能够实现较大的、更灵活的搜索空间。我们对SNN应用了约束的贝叶斯优化技术,由于SNN的评估时间高度随机,因此评估时间对并行计算的依赖较大。我们在多GPU Petascale架构上进行了大规模实验。通过利用SNN,结果表明,搜索加速,同时保持优化算法和最佳解的性能。我们将我们的方法应用于两种流行的训练算法,即尖峰时序相关塑性和代理梯度。早期检测使我们能够防止无用且昂贵的计算,将搜索方向转向有前景的超参数组合。我们的方法可以应用于多目标问题,其中尖峰活动经常被最小化以降低能耗。在这种情况下,找到尖峰网络和沉默网络之间的微妙边界至关重要。最后,我们的方法对神经架构搜索可能产生影响,特别是定义合适的尖峰网络架构。
https://arxiv.org/abs/2403.00450
Neural Architecture Search (NAS) paves the way for the automatic definition of Neural Network (NN) architectures, attracting increasing research attention and offering solutions in various scenarios. This study introduces a novel NAS solution, called Flat Neural Architecture Search (FlatNAS), which explores the interplay between a novel figure of merit based on robustness to weight perturbations and single NN optimization with Sharpness-Aware Minimization (SAM). FlatNAS is the first work in the literature to systematically explore flat regions in the loss landscape of NNs in a NAS procedure, while jointly optimizing their performance on in-distribution data, their out-of-distribution (OOD) robustness, and constraining the number of parameters in their architecture. Differently from current studies primarily concentrating on OOD algorithms, FlatNAS successfully evaluates the impact of NN architectures on OOD robustness, a crucial aspect in real-world applications of machine and deep learning. FlatNAS achieves a good trade-off between performance, OOD generalization, and the number of parameters, by using only in-distribution data in the NAS exploration. The OOD robustness of the NAS-designed models is evaluated by focusing on robustness to input data corruptions, using popular benchmark datasets in the literature.
Neural Architecture Search (NAS) 为神经网络 (NN) 架构的自动定义铺平了道路,吸引了越来越多的研究关注,并为各种场景提供了解决方案。本研究介绍了一种新颖的 NAS 解决方案,称为平滑神经架构搜索 (FlatNAS),探讨了基于新颖的基于容错性的指标和基于Sharpness-Aware最小化 (SAM) 的单 NN 优化之间的相互作用。FlatNAS 是文献中第一个系统地探索 NN 损失函数平面上平局的解决方案,同时也在其分布数据上对其性能和离散数据 (OOD) 鲁棒性进行优化,并限制其架构中参数的数量。与当前研究主要集中于 OOD 算法的研究不同,FlatNAS 成功地评估了 NN 架构对 OOD 鲁棒性的影响,这是机器和深度学习在现实世界应用中至关重要的一个方面。通过仅使用分布数据进行 NAS 探索,FlatNAS 实现了性能、OO 泛化性和参数数量之间的良好平衡。 FlatNAS 对 NAS 设计的模型的 OOD 鲁棒性进行了评估,通过关注输入数据污染的鲁棒性,使用了文献中流行的基准数据集。
https://arxiv.org/abs/2402.19102
Building efficient neural network architectures can be a time-consuming task requiring extensive expert knowledge. This task becomes particularly challenging for edge devices because one has to consider parameters such as power consumption during inferencing, model size, inferencing speed, and CO2 emissions. In this article, we introduce a novel framework designed to automatically discover new neural network architectures based on user-defined parameters, an expert system, and an LLM trained on a large amount of open-domain knowledge. The introduced framework (LeMo-NADe) is tailored to be used by non-AI experts, does not require a predetermined neural architecture search space, and considers a large set of edge device-specific parameters. We implement and validate this proposed neural architecture discovery framework using CIFAR-10, CIFAR-100, and ImageNet16-120 datasets while using GPT-4 Turbo and Gemini as the LLM component. We observe that the proposed framework can rapidly (within hours) discover intricate neural network models that perform extremely well across a diverse set of application settings defined by the user.
建立高效的神经网络架构可能是一个耗时且需要广泛专家知识的任务。对于边缘设备来说,这个任务变得尤为具有挑战性,因为需要考虑诸如功耗、模型大小、推理速度和二氧化碳排放等参数。在本文中,我们介绍了一个新框架,该框架可以根据用户定义的参数、专家系统和基于大量开放领域知识的大型语言模型(LLM)自动发现新的神经网络架构。所提出的框架(LeMo-NADe)专门针对非AI专家设计,不需要预先确定的神经架构搜索空间,并考虑了一个大型的边缘设备特定参数集。我们使用CIFAR-10、CIFAR-100和ImageNet16-120数据集来实施和验证所提出的神经网络架构发现框架,同时使用GPT-4 Turbo和Gemini作为LLM组件。我们观察到,与现有的方法相比,所提出的框架可以在几小时内迅速发现用户定义的复杂神经网络模型,这些模型在各种应用场景中表现出色。
https://arxiv.org/abs/2402.18443
Pareto front profiling in multi-objective optimization (MOO), i.e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training. Typically, in MOO neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics, and yields representative and diverse architectures across multiple devices in just one search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that can be conditioned on hardware features and preference vectors, enabling zero-shot transferability to new devices. Extensive experiments with up to 19 hardware devices and 3 objectives showcase the effectiveness and scalability of our method. Finally, we show that, without additional costs, our method outperforms existing MOO NAS methods across qualitatively different search spaces and datasets, including MobileNetV3 on ImageNet-1k and a Transformer space on machine translation.
在多目标优化(MOO)中进行帕累托前沿分析(PFA),即寻找多样性的帕累托最优解决方案,是一个具有挑战性的任务,尤其是在具有昂贵目标(如神经网络训练)的情况下。通常,在MOO神经架构搜索(NAS)中,我们试图在设备之间平衡性能和硬件指标。以前NAS方法通过将硬件约束融入目标函数来简化这一任务,但PFA需要对每个约束进行搜索。在这项工作中,我们提出了一个新颖的NAS算法,它将用户对性能和硬件指标之间的权衡的偏好编码到用户偏好的联合架构分布中,并仅在一次搜索运行中生成具有代表性的多样性的架构。为此,我们通过一个可以条件化硬件特征和偏好向量的超网络,对设备之间的联合 architectural 分布进行参数化,实现零 shots 传输到新设备。我们对多达19个硬件设备和3个目标进行了广泛的实验,展示了我们方法的有效性和可扩展性。最后,我们证明了,在没有额外费用的情况下,我们的方法在定性不同的搜索空间和数据集上优于现有的MOO NAS方法,包括在ImageNet-1k上使用移动NetV3和机器翻译空间上使用Transformer。
https://arxiv.org/abs/2402.18213
It is critical to deploy complicated neural network models on hardware with limited resources. This paper proposes a novel model quantization method, named the Low-Cost Proxy-Based Adaptive Mixed-Precision Model Quantization (LCPAQ), which contains three key modules. The hardware-aware module is designed by considering the hardware limitations, while an adaptive mixed-precision quantization module is developed to evaluate the quantization sensitivity by using the Hessian matrix and Pareto frontier techniques. Integer linear programming is used to fine-tune the quantization across different layers. Then the low-cost proxy neural architecture search module efficiently explores the ideal quantization hyperparameters. Experiments on the ImageNet demonstrate that the proposed LCPAQ achieves comparable or superior quantization accuracy to existing mixed-precision models. Notably, LCPAQ achieves 1/200 of the search time compared with existing methods, which provides a shortcut in practical quantization use for resource-limited devices.
在资源有限的情况下部署复杂的神性网络模型至关重要。本文提出了一种名为低成本代理基于适应混合精度模型量化(LCPAQ)的新模型量化方法,包含三个关键模块。硬件感知模块考虑到硬件限制,而自适应混合精度量化模块通过使用Hessian矩阵和帕累托前沿技术来评估量化敏感性。使用整数线性规划在不同的层之间微调量化。然后,低成本代理神经架构搜索模块有效地探索理想的量化超参数。在ImageNet上的实验表明,与现有混合精度模型相比,LCPAQ具有可比较或更好的量化精度。值得注意的是,LCPAQ比现有方法减少了1/200的搜索时间,为资源受限设备提供了一个实用的量化使用捷径。
https://arxiv.org/abs/2402.17706
Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
发展健壮且可解释的视觉系统是实现可信人工智能的重要一步。在这方面,一个有前景的范式考虑将任务所需的不变结构(例如几何不变)嵌入基本图像表示中。然而,这样的不变表示通常表现出有限的判别能力,限制了其在大型可信视觉任务中的应用。针对这个问题,我们进行了系统性的研究,从理论、实践和应用角度探讨了层次不变性。在理论层面上,我们证明了通过类似于卷积神经网络(CNN)的层次结构构建自监督类全局不变量(GUV)且在完全可解释的方式下构建。提供了总体的描述、具体的定义、不变性质和数值实现。在实践层面上,我们讨论了如何将这个理论框架定制到给定的任务上。在层次不变性的情况下,可以以类似于神经架构搜索(NAS)的方式动态地形成与任务相关的判别特征。我们在纹理、数字和寄生虫分类实验中证明了上述论点的准确度、不变性和效率。此外,在应用层面上,我们的表示在现实世界的法医取证任务中研究了对抗扰动和人工智能生成内容(AIGC)。这些应用表明,与传统的CNN和不变量相比,所提出的策略不仅实现了理论上的承诺的不变性,而且在深度学习时代也表现出了竞争力的判别能力。对于大型可信视觉任务,层次不变表示可以被视为传统CNN和不变量的有效替代方案。
https://arxiv.org/abs/2402.15430
Miniaturized autonomous unmanned aerial vehicles (UAVs) are gaining popularity due to their small size, enabling new tasks such as indoor navigation or people monitoring. Nonetheless, their size and simple electronics pose severe challenges in implementing advanced onboard intelligence. This work proposes a new automatic optimization pipeline for visual pose estimation tasks using Deep Neural Networks (DNNs). The pipeline leverages two different Neural Architecture Search (NAS) algorithms to pursue a vast complexity-driven exploration in the DNNs' architectural space. The obtained networks are then deployed on an off-the-shelf nano-drone equipped with a parallel ultra-low power System-on-Chip leveraging a set of novel software kernels for the efficient fused execution of critical DNN layer sequences. Our results improve the state-of-the-art reducing inference latency by up to 3.22x at iso-error.
微型自主无人机(UAVs)因尺寸小巧而备受欢迎,可执行诸如室内导航或人员监测等新任务。然而,它们的尺寸和简单的电子元件在实现高级车载智能方面提出了严重挑战。本文提出了一种使用深度神经网络(DNN)进行视觉姿态估计任务的自动优化管道。该管道利用两种不同的神经架构搜索(NAS)算法,在DNN的架构空间中进行广泛的复杂性驱动探索。获得的网络随后部署在一台配备具有并行超低功耗系统级芯片的消费级纳米无人机上,利用一系列新的软件内核实现关键DNN层序列的高效融合执行。我们的结果将先进的推理延迟降低了至 ISO 错误次数的3.22倍。
https://arxiv.org/abs/2402.15273