Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate models trained either using zero-cost-proxy metrics and neural graph features (GRAF) or by fine-tuning an off-the-shelf LM have high predictive power for the performance of architectures both within and across datasets, ii) these surrogates can be used to filter out bad architectures when searching on novel datasets, thereby significantly speeding up search and achieving better final performances, and iii) the surrogates can be further used directly as the search objective for huge speed-ups.
神经架构搜索(NAS)在平衡探索表达丰富、广泛的搜索空间以实现架构创新和高效评估架构的需求之间面临着挑战。我们研究了基于上下文无关语法的高度表现力的NAS搜索空间中,通过训练替代模型来改进搜索的方法。我们的发现包括: 1. 使用零成本代理指标或神经图特征(GRAF)训练的替代模型,以及微调现成的语言模型得到的替代模型,对于预测架构在数据集内部和跨不同数据集上的性能具有很高的预测能力。 2. 这些替代模型可以用于过滤新数据集中较差的架构,从而显著加快搜索速度并达到更好的最终性能。 3. 此外,这些替代模型可以直接用作搜索目标,以实现巨大的加速效果。
https://arxiv.org/abs/2504.12971
This paper introduces a novel framework for designing efficient neural network architectures specifically tailored to tiny machine learning (TinyML) platforms. By leveraging large language models (LLMs) for neural architecture search (NAS), a vision transformer (ViT)-based knowledge distillation (KD) strategy, and an explainability module, the approach strikes an optimal balance between accuracy, computational efficiency, and memory usage. The LLM-guided search explores a hierarchical search space, refining candidate architectures through Pareto optimization based on accuracy, multiply-accumulate operations (MACs), and memory metrics. The best-performing architectures are further fine-tuned using logits-based KD with a pre-trained ViT-B/16 model, which enhances generalization without increasing model size. Evaluated on the CIFAR-100 dataset and deployed on an STM32H7 microcontroller (MCU), the three proposed models, LMaNet-Elite, LMaNet-Core, and QwNet-Core, achieve accuracy scores of 74.50%, 74.20% and 73.00%, respectively. All three models surpass current state-of-the-art (SOTA) models, such as MCUNet-in3/in4 (69.62% / 72.86%) and XiNet (72.27%), while maintaining a low computational cost of less than 100 million MACs and adhering to the stringent 320 KB static random-access memory (SRAM) constraint. These results demonstrate the efficiency and performance of the proposed framework for TinyML platforms, underscoring the potential of combining LLM-driven search, Pareto optimization, KD, and explainability to develop accurate, efficient, and interpretable models. This approach opens new possibilities in NAS, enabling the design of efficient architectures specifically suited for TinyML.
这篇论文介绍了一种为小型机器学习(TinyML)平台设计高效神经网络架构的新型框架。该方法通过利用大型语言模型(LLM)进行神经架构搜索(NAS),结合基于视觉变换器(ViT)的知识蒸馏(KD)策略和可解释性模块,实现了准确度、计算效率以及内存使用之间的最佳平衡。由LLM引导的搜索在层次化的搜索空间中探索,通过以精度、乘积累加操作(MACs)和内存指标为基础的帕累托优化来精炼候选架构。最优表现的架构进一步利用预训练的ViT-B/16模型进行基于logits的KD微调,从而提升泛化能力而不增加模型大小。 在CIFAR-100数据集上评估并部署到STM32H7微控制器(MCU)上的三个提出的模型——LMaNet-Elite、LMaNet-Core和QwNet-Core,分别达到了74.50%、74.20% 和 73.00% 的精度。这三个模型均超越了当前最先进的(SOTA)模型,如MCUNet-in3/in4(69.62%/72.86%)和XiNet(72.27%),同时保持了低于1亿MACs的低计算成本,并且遵守严格的320KB静态随机存取内存(SRAM)限制。这些结果展示了该框架在TinyML平台上的效率与性能,强调了结合LLM驱动搜索、帕累托优化、KD和可解释性开发准确、高效及可解释模型的巨大潜力。 这种方法为NAS开辟了新的可能性,使得设计适合TinyML的高效架构成为可能。
https://arxiv.org/abs/2504.09685
Many studies estimate energy consumption using proxy metrics like memory usage, FLOPs, and inference latency, with the assumption that reducing these metrics will also lower energy consumption in neural networks. This paper, however, takes a different approach by introducing an energy-efficient Neural Architecture Search (NAS) method that directly focuses on identifying architectures that minimize energy consumption while maintaining acceptable accuracy. Unlike previous methods that primarily target vision and language tasks, the approach proposed here specifically addresses tabular datasets. Remarkably, the optimal architecture suggested by this method can reduce energy consumption by up to 92% compared to architectures recommended by conventional NAS.
许多研究通过使用代理指标(如内存使用量、FLOPs 和推理延迟)来估算能源消耗,假设减少这些指标也会降低神经网络的能耗。然而,本文采用了一种不同的方法,引入了一种节能型神经架构搜索(NAS)方法,该方法直接专注于识别在保持可接受精度的同时最小化能耗的架构。与以往主要针对视觉和语言任务的方法不同,这里提出的方法特别针对表格数据集。值得注意的是,此方法建议的最佳架构相比传统 NAS 推荐的架构可以减少高达 92% 的能源消耗。
https://arxiv.org/abs/2504.08359
This work presents MicroNAS, an automated neural architecture search tool specifically designed to create models optimized for microcontrollers with small memory resources. The ESP32 microcontroller, with 320 KB of memory, is used as the target platform. The artificial intelligence contribution lies in a novel method for optimizing convolutional neural network and gated recurrent unit architectures by considering the memory size of the target microcontroller as a guide. A comparison is made between memory-driven model optimization and traditional two-stage methods, which use pruning, to show the effectiveness of the proposed framework. To demonstrate the engineering application of MicroNAS, a fall detection system (FDS) for lower-limb amputees is developed as a pilot study. A critical challenge in fall detection studies, class imbalance in the dataset, is addressed. The results show that MicroNAS models achieved higher F1-scores than alternative approaches, such as ensemble methods and H2O Automated Machine Learning, presenting a significant step forward in real-time FDS development. Biomechanists using body-worn sensors for activity detection can adopt the open-source code to design machine learning models tailored for microcontroller platforms with limited memory.
这项工作介绍了MicroNAS,这是一种自动神经架构搜索工具,专门设计用于为内存资源有限的微控制器创建优化模型。目标平台是ESP32微控制器,其内存容量为320 KB。人工智能方面的贡献在于提出了一种新颖的方法,通过考虑目标微控制器的内存大小来优化卷积神经网络和门控循环单元架构。研究将基于内存驱动的模型优化方法与传统的两阶段方法(使用剪枝技术)进行了比较,以展示所提框架的有效性。为了展示MicroNAS的实际工程应用价值,开发了一个用于下肢截肢者的跌倒检测系统(FDS)作为试点研究项目。在这一过程中,还解决了跌倒检测研究中数据集不平衡的关键挑战。结果显示,MicroNAS模型的F1分数高于包括集成方法和H2O自动化机器学习在内的其他方法,这表明在实时FDS开发方面取得了重大进展。从事生物力学研究并使用穿戴式传感器进行活动检测的研究人员可以采用开源代码来设计针对内存受限微控制器平台的机器学习模型。
https://arxiv.org/abs/2504.07397
Federated Learning (FL) often struggles with data heterogeneity due to the naturally uneven distribution of user data across devices. Federated Neural Architecture Search (NAS) enables collaborative search for optimal model architectures tailored to heterogeneous data to achieve higher accuracy. However, this process is time-consuming due to extensive search space and retraining. To overcome this, we introduce FedMetaNAS, a framework that integrates meta-learning with NAS within the FL context to expedite the architecture search by pruning the search space and eliminating the retraining stage. Our approach first utilizes the Gumbel-Softmax reparameterization to facilitate relaxation of the mixed operations in the search space. We then refine the local search process by incorporating Model-Agnostic Meta-Learning, where a task-specific learner adapts both weights and architecture parameters (alphas) for individual tasks, while a meta learner adjusts the overall model weights and alphas based on the gradient information from task learners. Following the meta-update, we propose soft pruning using the same trick on search space to gradually sparsify the architecture, ensuring that the performance of the chosen architecture remains robust after pruning which allows for immediate use of the model without retraining. Experimental evaluations demonstrate that FedMetaNAS significantly accelerates the search process by more than 50\% with higher accuracy compared to FedNAS.
联邦学习(FL)通常会因为用户数据在不同设备上的分布不均而面临数据异质性的问题。联邦神经架构搜索(NAS)通过协作搜索,旨在为异构数据找到最优的模型结构以提高准确性。然而,这一过程由于搜索空间广泛以及需要重新训练而耗时较长。为了克服这个问题,我们引入了FedMetaNAS框架,该框架将元学习与NAS集成在联邦学习环境中,通过缩小搜索空间和消除重训阶段来加速架构搜索。 我们的方法首先使用Gumbel-Softmax再参数化技术使搜索空间中的混合操作松弛。随后,我们利用模型无关的元学习来优化局部搜索过程,在这种过程中,特定任务的学习者同时调整权重和架构参数(alphas)以适应个体任务,而元学习者则根据来自各个任务学习者的梯度信息调整整个模型的权重和alphas。 在完成元更新后,我们提出了一种使用同样技巧进行软修剪的方法来逐渐稀疏化架构。这种方法确保了在剪枝后的选定架构性能仍然稳健,并且可以在不重新训练的情况下直接使用该模型。 实验评估表明,FedMetaNAS与FedNAS相比,在提高搜索过程的速度超过50%的同时还能获得更高的准确性。
https://arxiv.org/abs/2504.06457
This study explores the application of supervised and unsupervised autoencoders (AEs) to automate nuclei classification in clear cell renal cell carcinoma (ccRCC) images, a diagnostic task traditionally reliant on subjective visual grading by pathologists. We evaluate various AE architectures, including standard AEs, contractive AEs (CAEs), and discriminative AEs (DAEs), as well as a classifier-based discriminative AE (CDAE), optimized using the hyperparameter tuning tool Optuna. Bhattacharyya distance is selected from several metrics to assess class separability in the latent space, revealing challenges in distinguishing adjacent grades using unsupervised models. CDAE, integrating a supervised classifier branch, demonstrated superior performance in both latent space separation and classification accuracy. Given that CDAE-CNN achieved notable improvements in classification metrics, affirming the value of supervised learning for class-specific feature extraction, F1 score was incorporated into the tuning process to optimize classification performance. Results show significant improvements in identifying aggressive ccRCC grades by leveraging the classification capability of AE through latent clustering followed by fine-grained classification. Our model outperforms the current state of the art, CHR-Network, across all evaluated metrics. These findings suggest that integrating a classifier branch in AEs, combined with neural architecture search and contrastive learning, enhances grading automation in ccRCC pathology, particularly in detecting aggressive tumor grades, and may improve diagnostic accuracy.
这项研究探讨了监督式和非监督式自动编码器(AEs)在透明细胞肾细胞癌(ccRCC)图像中自动化核分类的应用,这是一个传统上依赖病理学家主观视觉评估的诊断任务。我们评估了各种AE架构,包括标准AE、收缩AE(CAE)、判别性AE(DAE),以及使用超参数优化工具Optuna优化的基于分类器的判别性AE(CDAE)。从多个度量中选择了Bhattacharyya距离来评估潜在空间中的类分离情况,揭示了使用无监督模型区分相邻等级时面临的挑战。集成有监督分类分支的CDAE在潜在空间分离和分类准确性方面表现出色。 鉴于CDAE-CNN在分类指标上取得了显著改进,证实了针对特定类别特征提取的监督学习的价值,F1分数被纳入到优化过程中以提高分类性能。结果表明,通过使用自动编码器进行潜在聚类后紧随其后的精细分类能力,在识别侵袭性ccRCC等级方面有了明显的改善。我们的模型在所有评估指标上都优于当前最先进的CHR-Network。这些发现表明,结合神经架构搜索和对比学习的自动编码器中的分类分支,可以提高ccRCC病理学中分级自动化的能力,特别是在检测侵袭性肿瘤等级方面,并可能提高诊断准确性。
https://arxiv.org/abs/2504.03146
In machine learning, Neural Architecture Search (NAS) requires domain knowledge of model design and a large amount of trial-and-error to achieve promising performance. Meanwhile, evolutionary algorithms have traditionally relied on fixed rules and pre-defined building blocks. The Large Language Model (LLM)-Guided Evolution (GE) framework transformed this approach by incorporating LLMs to directly modify model source code for image classification algorithms on CIFAR data and intelligently guide mutations and crossovers. A key element of LLM-GE is the "Evolution of Thought" (EoT) technique, which establishes feedback loops, allowing LLMs to refine their decisions iteratively based on how previous operations performed. In this study, we perform NAS for object detection by improving LLM-GE to modify the architecture of You Only Look Once (YOLO) models to enhance performance on the KITTI dataset. Our approach intelligently adjusts the design and settings of YOLO to find the optimal algorithms against objective such as detection accuracy and speed. We show that LLM-GE produced variants with significant performance improvements, such as an increase in Mean Average Precision from 92.5% to 94.5%. This result highlights the flexibility and effectiveness of LLM-GE on real-world challenges, offering a novel paradigm for automated machine learning that combines LLM-driven reasoning with evolutionary strategies.
在机器学习领域,神经架构搜索(NAS)需要模型设计的专业知识和大量的试错才能取得令人满意的表现。与此同时,传统的进化算法依赖于固定的规则和预先定义的构建模块。大型语言模型(LLM)引导演化(GE)框架通过将LLMs融入其中,直接修改图像分类算法在CIFAR数据集上的模型源代码,并智能地指导变异与交叉过程,从而改变了这一方法。LLM-GE的关键元素是“思想进化”(EoT)技术,它建立了反馈循环机制,使LLMs能够根据之前的操作表现来迭代改进其决策。 在这项研究中,我们通过优化LLM-GE框架,在对象检测任务上进行神经架构搜索(NAS),以改进You Only Look Once (YOLO)模型的架构,并在KITTI数据集上提高性能。我们的方法智能地调整和设置YOLO的设计参数,以便找到针对准确性和速度等目标的最佳算法。我们展示了LLM-GE生成了具有显著性能提升的新变体,例如平均精度均值从92.5%提升至94.5%。 这一结果强调了LLM-GE在解决实际问题时的灵活性和有效性,为自动机器学习提供了一种新的范式,结合了由大型语言模型驱动的推理与进化策略。
https://arxiv.org/abs/2504.02280
Architecture design and optimization are challenging problems in the field of artificial neural networks. Working in this context, we here present SPARCS (SPectral ARchiteCture Search), a novel architecture search protocol which exploits the spectral attributes of the inter-layer transfer matrices. SPARCS allows one to explore the space of possible architectures by spanning continuous and differentiable manifolds, thus enabling for gradient-based optimization algorithms to be eventually employed. With reference to simple benchmark models, we show that the newly proposed method yields a self-emerging architecture with a minimal degree of expressivity to handle the task under investigation and with a reduced parameter count as compared to other viable alternatives.
在人工神经网络领域,架构设计和优化是极具挑战性的问题。在此背景下,我们提出了SPARCS(SPectral ARchiteCture Search),这是一种新颖的架构搜索协议,利用了层间转移矩阵的谱属性。通过探索连续且可微流形的空间,SPARCS使得基于梯度的优化算法能够最终被采用。参考简单的基准模型,我们展示了新提出的方法可以生成一种具有处理任务所需最小表达力和参数量较少的自出现架构,相较于其他可行选项而言更为优越。
https://arxiv.org/abs/2504.00885
Multi-task neural architecture search (NAS) enables transferring architectural knowledge among different tasks. However, ranking disorder between the source task and the target task degrades the architecture performance on the downstream task. We propose KTNAS, an evolutionary cross-task NAS algorithm, to enhance transfer efficiency. Our data-agnostic method converts neural architectures into graphs and uses architecture embedding vectors for the subsequent architecture performance prediction. The concept of transfer rank, an instance-based classifier, is introduced into KTNAS to address the performance degradation issue. We verify the search efficiency on NASBench-201 and transferability to various vision tasks on Micro TransNAS-Bench-101. The scalability of our method is demonstrated on DARTs search space including CIFAR-10/100, MNIST/Fashion-MNIST, MedMNIST. Experimental results show that KTNAS outperforms peer multi-task NAS algorithms in search efficiency and downstream task performance. Ablation studies demonstrate the vital importance of transfer rank for transfer performance.
多任务神经架构搜索(NAS)使不同任务之间的网络结构知识得以转移。然而,源任务与目标任务间的排名顺序混乱会降低在下游任务上构建的架构性能。我们提出了KTNAS,一种进化型跨任务NAS算法,以提高迁移效率。我们的数据无关方法将神经网络结构转换为图,并使用架构嵌入向量来进行后续的架构性能预测。为了应对性能下降问题,我们在KTNAS中引入了基于实例分类器的概念“转移秩”。我们已经在NASBench-201上验证了搜索效率,在Micro TransNAS-Bench-101上的各种视觉任务上验证了可迁移性,并在包括CIFAR-10/100、MNIST/Fashion-MNIST和MedMNIST在内的DARTS搜索空间中展示了方法的扩展能力。实验结果显示,KTNAS在搜索效率和下游任务性能方面优于同行中的多任务NAS算法。消融研究表明,转移秩对于迁移性能至关重要。
https://arxiv.org/abs/2504.00772
Neural Architecture Search (NAS) for deep learning object detection frameworks typically involves multiple modules, each performing distinct tasks. These modules contribute to a vast search space, resulting in searches that can take several GPU hours or even days, depending on the complexity of the search space. This makes joint optimization both challenging and computationally expensive. Furthermore, satisfying target device constraints across modules adds additional complexity to the optimization process. To address these challenges, we propose \textbf{FACETS}, e\textbf{\underline{F}}ficient Once-for-\textbf{\underline{A}}ll Object Detection via \textbf{\underline{C}}onstrained it\textbf{\underline{E}}ra\textbf{\underline{T}}ive\textbf{\underline{S}}earch, a novel unified iterative NAS method that refines the architecture of all modules in a cyclical manner. FACETS leverages feedback from previous iterations, alternating between fixing one module's architecture and optimizing the others. This approach reduces the overall search space while preserving interdependencies among modules and incorporates constraints based on the target device's computational budget. In a controlled comparison against progressive and single-module search strategies, FACETS achieves architectures with up to $4.75\%$ higher accuracy twice as fast as progressive search strategies in earlier stages, while still being able to achieve a global optimum. Moreover, FACETS demonstrates the ability to iteratively refine the search space, producing better performing architectures over time. The refined search space yields candidates with a mean accuracy up to $27\%$ higher than global search and $5\%$ higher than progressive search methods via random sampling.
神经架构搜索(NAS)在深度学习目标检测框架中通常涉及多个执行不同任务的模块。这些模块共同构成了一个巨大的搜索空间,导致搜索过程可能需要数小时甚至几天的时间,具体取决于搜索空间的复杂性。这使得联合优化既具有挑战性又计算成本高昂。此外,在满足目标设备跨模块约束的情况下进行优化增加了额外的复杂度。为了解决这些问题,我们提出了**FACETS**(通过受限迭代搜索实现高效的一次性所有对象检测)这一新颖的统一迭代NAS方法,该方法在循环中对所有模块的架构进行精细化调整。 FACETS 利用前一次迭代的反馈,在每次迭代中交替固定一个模块的架构并优化其他模块。这种方法减少了总体搜索空间,同时保持了各模块之间的依赖关系,并根据目标设备的计算预算引入了约束条件。通过与渐进式和单一模块搜索策略进行受控对比实验,FACETS 在早期阶段实现了最高提升27%平均精度的目标架构,其准确性比渐进式搜索策略高出4.75%,速度却是后者的两倍,并且依然能够达到全局最优解。此外,在整个优化过程中,FACETS 有能力逐步细化搜索空间,从而随着时间的推移生成性能更好的架构。 具体而言,通过随机采样,FACETS 的精细化搜索空间产生的候选架构在平均准确度上比全球搜索方法高出27%,比渐进式搜索方法高5%。
https://arxiv.org/abs/2503.21999
Spiking Neural Networks (SNNs) are highly regarded for their energy efficiency, inherent activation sparsity, and suitability for real-time processing in edge devices. However, most current SNN methods adopt architectures resembling traditional artificial neural networks (ANNs), leading to suboptimal performance when applied to SNNs. While SNNs excel in energy efficiency, they have been associated with lower accuracy levels than traditional ANNs when utilizing conventional architectures. In response, in this work we present LightSNN, a rapid and efficient Neural Network Architecture Search (NAS) technique specifically tailored for SNNs that autonomously leverages the most suitable architecture, striking a good balance between accuracy and efficiency by enforcing sparsity. Based on the spiking NAS network (SNASNet) framework, a cell-based search space including backward connections is utilized to build our training-free pruning-based NAS mechanism. Our technique assesses diverse spike activation patterns across different data samples using a sparsity-aware Hamming distance fitness evaluation. Thorough experiments are conducted on both static (CIFAR10 and CIFAR100) and neuromorphic datasets (DVS128-Gesture). Our LightSNN model achieves state-of-the-art results on CIFAR10 and CIFAR100, improves performance on DVS128Gesture by 4.49%, and significantly reduces search time, most notably offering a 98x speedup over SNASNet and running 30% faster than the best existing method on DVS128Gesture.
脉冲神经网络(SNNs)因其高能效、固有的激活稀疏性和边缘设备实时处理的适用性而备受推崇。然而,目前大多数SNN方法采用类似于传统人工神经网络(ANNs)的架构,在应用于SNN时导致性能不佳。尽管SNN在能耗方面表现出色,但在使用常规架构的情况下,其准确性往往低于传统的ANNS。为解决这一问题,我们在此提出LightSNN,这是一种专门针对SNN设计的快速高效的神经网络架构搜索(NAS)技术,能够自主选择最合适的架构,在保持高准确性的基础上提高效率,并通过强制稀疏性来实现最佳平衡。基于脉冲NAS网络(SNASNet)框架,我们在包括反向连接的单元搜索空间上构建了一个无需训练的剪枝基NAS机制。我们的方法使用稀疏感知汉明距离评估不同数据样本中的多种脉冲激活模式。我们在静态(CIFAR10和CIFAR100)和神经形态数据集(DVS128-Gesture)上进行了详尽的实验。我们的LightSNN模型在CIFAR10和CIFAR100数据集上取得了最佳结果,在DVS128Gesture数据集上的性能提升了4.49%,并且显著减少了搜索时间,相比SNASNet速度提升高达98倍,并且在DVS128Gesture上比现有最优方法快30%。
https://arxiv.org/abs/2503.21846
Monte-Carlo Tree Search (MCTS) is a powerful tool for many non-differentiable search related problems such as adversarial games. However, the performance of such approach highly depends on the order of the nodes that are considered at each branching of the tree. If the first branches cannot distinguish between promising and deceiving configurations for the final task, the efficiency of the search is exponentially reduced. In Neural Architecture Search (NAS), as only the final architecture matters, the visiting order of the branching can be optimized to improve learning. In this paper, we study the application of MCTS to NAS for image classification. We analyze several sampling methods and branching alternatives for MCTS and propose to learn the branching by hierarchical clustering of architectures based on their similarity. The similarity is measured by the pairwise distance of output vectors of architectures. Extensive experiments on two challenging benchmarks on CIFAR10 and ImageNet show that MCTS, if provided with a good branching hierarchy, can yield promising solutions more efficiently than other approaches for NAS problems.
蒙特卡洛树搜索(MCTS)是一种强大的工具,适用于许多与非微分相关的问题,如对抗性游戏。然而,这种方法的性能很大程度上取决于在每棵树分支时考虑节点的顺序。如果最初的几个分支无法区分对最终任务有前景和误导性的配置,那么搜索效率将呈指数级下降。在神经架构搜索(NAS)中,只有最后的架构才重要,因此可以通过优化分枝访问顺序来改进学习过程。在这篇论文中,我们研究了MCTS在图像分类领域应用于NAS的应用。我们分析了几种采样方法和MCTS分支选择的替代方案,并提出通过基于架构相似性的层次聚类来学习分支选择。相似性是通过架构输出向量之间的成对距离进行度量的。在两个具有挑战性的CIFAR10和ImageNet基准测试上的广泛实验表明,如果为MCTS提供一个良好的分枝层级结构,它能够比其他NAS方法更高效地生成有前景的解决方案。
https://arxiv.org/abs/2503.21061
Neural architecture search (NAS) provides a systematic framework for automating the design of neural network architectures, yet its widespread adoption is hindered by prohibitive computational requirements. Existing zero-cost proxy methods, while reducing search overhead, demonstrate inadequate performance in architecture ranking tasks, particularly for Transformer-based models where they often underperform simple parameter counting metrics. Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics computation while decomposing Transformer architectures into functionally distinct sub-modules, thereby optimizing the balance of their contributions to overall performance. Our comprehensive evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark. The proposed method exhibits exceptional computational efficiency while maintaining robust performance across diverse NAS benchmark tasks, offering a practical solution for large-scale architecture search.
神经架构搜索(NAS)为自动化设计神经网络结构提供了一种系统框架,然而其广泛应用受到计算资源需求过高的限制。现有零成本代理方法虽然减少了搜索开销,但在评估模型性能排名任务中表现不足,特别是在基于Transformer的模型上,它们通常比简单的参数计数指标表现更差。目前自动化的代理发现方法面临着延长了的搜索时间、对数据过度拟合的易感性以及结构复杂度的问题。 本文介绍了一种新颖的零成本代理方法论,通过高效地计算权重统计来量化模型容量,并将Transformer架构分解为功能上相互独立的子模块,从而优化它们在整体性能中的贡献平衡。我们的全面评估展示了这一方法的优势,在FlexiBERT基准测试中分别实现了0.76的Spearman等级相关系数和0.53的Kendall等级相关系数。所提出的这种方法展现了卓越的计算效率,并且在整个多样化的NAS基准任务中保持了强大的性能,提供了一种大规模架构搜索的实际解决方案。
https://arxiv.org/abs/2503.18646
Spatial-temporal sequence forecasting (STSF) is a long-standing research problem with widespread real-world applications. Neural architecture search (NAS), which automates the neural network design, has been shown effective in tackling the STSF problem. However, the existing NAS methods for STSF focus on generating architectures in a time-consuming data-driven fashion, which heavily limits their ability to use background knowledge and explore the complicated search trajectory. Large language models (LLMs) have shown remarkable ability in decision-making with comprehensive internal world knowledge, but how it could benefit NAS for STSF remains unexplored. In this paper, we propose a novel NAS method for STSF based on LLM. Instead of directly generate architectures with LLM, We inspire the LLM's capability with a multi-level enhancement mechanism. Specifically, on the step-level, we decompose the generation task into decision steps with powerful prompt engineering and inspire LLM to serve as instructor for architecture search based on its internal knowledge. On the instance-level, we utilize a one-step tuning framework to quickly evaluate the architecture instance and a memory bank to cumulate knowledge to improve LLM's search ability. On the task-level, we propose a two-stage architecture search, balancing the exploration stage and optimization stage, to reduce the possibility of being trapped in local optima. Extensive experimental results demonstrate that our method can achieve competitive effectiveness with superior efficiency against existing NAS methods for STSF.
空间时间序列预测(STSF)是一个长久以来的研究问题,具有广泛的实际应用。神经架构搜索(NAS),通过自动化设计神经网络,在解决STSF问题方面已被证明是有效的。然而,现有的用于STSF的NAS方法主要集中在耗时的数据驱动方式生成架构上,这极大地限制了它们利用背景知识和探索复杂搜索路径的能力。大型语言模型(LLMs)在决策中展示了使用全面内部世界知识的卓越能力,但其如何能对STSF中的NAS产生帮助仍是一个未被探索的问题。 在这篇论文中,我们提出了一种基于LLM的新颖NAS方法用于STSF。不同于直接利用LLM生成架构,我们在一个多层次增强机制上激发了LLM的能力。具体来说,在步骤层面,我们将生成任务分解为具有强大提示工程的决策步骤,并激励LLM根据其内部知识作为架构搜索的指导。在实例层面,我们采用了一步调整框架快速评估架构实例并利用记忆库累积知识以提高LLM的搜索能力。在任务层面,我们提出了一种两阶段架构搜索方法,平衡探索阶段和优化阶段,从而减少陷入局部最优解的可能性。 广泛的实验结果表明,与现有的STSF NAS方法相比,我们的方法能够在保持竞争优势的同时实现更高的效率。
https://arxiv.org/abs/2503.17994
Approximate deep neural networks (AxDNNs) are promising for enhancing energy efficiency in real-world devices. One of the key contributors behind this enhanced energy efficiency in AxDNNs is the use of approximate multipliers. Unfortunately, the simulation of approximate multipliers does not usually scale well on CPUs and GPUs. As a consequence, this slows down the overall simulation of AxDNNs aimed at identifying the appropriate approximate multipliers to achieve high energy efficiency with a minimum accuracy loss. To address this problem, we present a novel XAI-Gen methodology, which leverages the analytical model of the emerging hardware accelerator (e.g., Google TPU v4) and explainable artificial intelligence (XAI) to precisely identify the non-critical layers for approximation and quickly discover the appropriate approximate multipliers for AxDNN layers. Our results show that XAI-Gen achieves up to 7x lower energy consumption with only 1-2% accuracy loss. We also showcase the effectiveness of the XAI-Gen approach through a neural architecture search (XAI-NAS) case study. Interestingly, XAI-NAS achieves 40\% higher energy efficiency with up to 5x less execution time when compared to the state-of-the-art NAS methods for generating AxDNNs.
近似深度神经网络(AxDNN)在提高实际设备的能源效率方面显示出巨大潜力。AxDNN能效提升的关键因素之一是使用了近似的乘法器。然而,不幸的是,在CPU和GPU上模拟这些近似乘法器通常不具良好的可扩展性。这导致识别用于实现高能效且精度损失最小的适当近似乘法器的整体仿真过程变慢。 为了解决这一问题,我们提出了一种新颖的方法论——XAI-Gen(基于解释型人工智能和新兴硬件加速器分析模型),该方法能够精确地识别可以进行近似的非关键层,并快速发现适用于AxDNN层的适当近似乘法器。实验结果显示,通过使用XAI-Gen,我们可以实现高达7倍的能源消耗降低,并且仅损失1%到2%的精度。 我们还通过神经架构搜索(XAI-NAS)案例研究展示了XAI-Gen方法的有效性。有趣的是,与现有的用于生成AxDNN的最佳NAS方法相比,XAI-NAS在能达到同样或更好性能的情况下,执行时间缩短了高达5倍,并且能源效率提高了40%。 这种方法不仅能够提高近似深度神经网络的能效,还能显著减少其开发时间和成本,从而推动这类技术在各种资源受限设备中的广泛应用。
https://arxiv.org/abs/2503.16583
When employing an evolutionary algorithm to optimize a neural networks architecture, developers face the added challenge of tuning the evolutionary algorithm's own hyperparameters - population size, mutation rate, cloning rate, and number of generations. This paper introduces Neuvo Ecological Neural Architecture Search (ENAS), a novel method that incorporates these evolutionary parameters directly into the candidate solutions' phenotypes, allowing them to evolve dynamically alongside architecture specifications. Experimental results across four binary classification datasets demonstrate that ENAS not only eliminates manual tuning of evolutionary parameters but also outperforms competitor NAS methodologies in convergence speed (reducing computational time by 18.3%) and accuracy (improving classification performance in 3 out of 4 datasets). By enabling "greedy individuals" to optimize resource allocation based on fitness, ENAS provides an efficient, self-regulating approach to neural architecture search.
在使用进化算法优化神经网络架构时,开发人员面临着调整该进化算法自身超参数(如种群大小、变异率、克隆率和代数数量)的额外挑战。本文介绍了一种名为“生态型神经结构搜索”(ENAS) 的新颖方法,它将这些进化参数直接纳入候选解决方案的表型中,使它们能够与架构规范一同动态演化。实验结果表明,在四个二元分类数据集上,ENAS不仅消除了手动调整进化参数的需求,并且在收敛速度(减少18.3%计算时间)和准确性(改善了4个数据集中3个的数据分类性能)方面优于竞争对手的神经结构搜索方法。通过允许“贪婪个体”根据其适应性优化资源分配,ENAS提供了一种高效且自我调节的神经网络架构搜索途径。
https://arxiv.org/abs/2503.10908
The choice of neural network features can have a large impact on both the accuracy and speed of the network. Despite the current industry shift towards large transformer models, specialized binary classifiers remain critical for numerous practical applications where computational efficiency and low latency are essential. Neural network features tend to be developed homogeneously, resulting in slower or less accurate networks when testing against multiple datasets. In this paper, we show the effectiveness of Neuvo NAS+ a novel Python implementation of an extended Neural Architecture Search (NAS+) which allows the user to optimise the training parameters of a network as well as the network's architecture. We provide an in-depth analysis of the importance of catering a network's architecture to each dataset. We also describe the design of the Neuvo NAS+ system that selects network features on a task-specific basis including network training hyper-parameters such as the number of epochs and batch size. Results show that the Neuvo NAS+ task-specific approach significantly outperforms several machine learning approaches such as Naive Bayes, C4.5, Support Vector Machine and a standard Artificial Neural Network for solving a range of binary classification problems in terms of accuracy. Our experiments demonstrate substantial diversity in evolved network architectures across different datasets, confirming the value of task-specific optimization. Additionally, Neuvo NAS+ outperforms other evolutionary algorithm optimisers in terms of both accuracy and computational efficiency, showing that properly optimized binary classifiers can match or exceed the performance of more complex models while requiring significantly fewer computational resources.
神经网络特征的选择会对网络的准确性和速度产生重大影响。尽管当前行业正转向大型转换器模型,但在计算效率和低延迟至关重要的许多实际应用中,专门化的二元分类器仍然至关重要。由于神经网络特征通常以相同的方式开发,因此在面对多个数据集时,这些网络往往会变得运行更慢或精度更低。 本文展示了Neuvo NAS+(一种新型的扩展版神经架构搜索NAS+的Python实现)的有效性,该系统允许用户优化网络训练参数以及整个网络架构。我们深入分析了根据每个数据集来定制网络架构的重要性,并描述了Neuvo NAS+系统的设计方法,该系统基于任务特定选择网络特征,包括诸如训练周期数和批次大小这样的网络训练超参数。 实验结果显示,对于一系列二元分类问题,Neuvo NAS+的任务特定方法显著优于多种机器学习方法(如朴素贝叶斯、C4.5决策树、支持向量机和标准人工神经网络)的性能,具体表现在准确率上。我们的实验证明了在不同的数据集之间演化的网络架构具有很大的多样性,这证实了任务特异性优化的价值。 此外,在精度和计算效率方面,Neuvo NAS+优于其他进化算法优化器,表明经过适当优化的二元分类器可以与更复杂的模型匹敌或超越其性能,同时需要显著较少的计算资源。
https://arxiv.org/abs/2503.10869
N-shot neural architecture search (NAS) exploits a supernet containing all candidate subnets for a given search space. The subnets are typically trained with a static training strategy (e.g., using the same learning rate (LR) scheduler and optimizer for all subnets). This, however, does not consider that individual subnets have distinct characteristics, leading to two problems: (1) The supernet training is biased towards the low-complexity subnets (unfairness); (2) the momentum update in the supernet is noisy (noisy momentum). We present a dynamic supernet training technique to address these problems by adjusting the training strategy adaptive to the subnets. Specifically, we introduce a complexity-aware LR scheduler (CaLR) that controls the decay ratio of LR adaptive to the complexities of subnets, which alleviates the unfairness problem. We also present a momentum separation technique (MS). It groups the subnets with similar structural characteristics and uses a separate momentum for each group, avoiding the noisy momentum problem. Our approach can be applicable to various N-shot NAS methods with marginal cost, while improving the search performance drastically. We validate the effectiveness of our approach on various search spaces (e.g., NAS-Bench-201, Mobilenet spaces) and datasets (e.g., CIFAR-10/100, ImageNet).
N-shot神经架构搜索(NAS)利用包含给定搜索空间内所有候选子网络的超网。这些子网络通常采用静态训练策略(例如,使用相同的学习率(LR)调度器和优化器来训练所有子网络)。然而,这种方法没有考虑到每个子网络具有独特的特性,导致了两个问题:(1) 超网训练偏向于低复杂度的子网络(不公平性);(2) 在超网中的动量更新是嘈杂的(嘈杂动量)。我们提出了一种动态超网训练技术来调整适应各个子网特性的训练策略,以解决这些问题。具体而言,我们引入了一个基于复杂度的学习率调度器(CaLR),它根据子网络的复杂性控制学习率衰减比例,缓解了不公平问题。此外,我们还提出了动量分离技术(MS)。该技术将具有相似结构特征的子网络进行分组,并为每组使用单独的动量更新,避免了嘈杂动量的问题。我们的方法可以以边际成本应用于各种N-shot NAS方法,并能显著提高搜索性能。我们在不同的搜索空间(如NAS-Bench-201、Mobilenet 空间)和数据集(如CIFAR-10/100、ImageNet)上验证了我们方法的有效性。
https://arxiv.org/abs/2503.10740
Neural Architecture Search (NAS) has become an essential tool for designing effective and efficient neural networks. In this paper, we investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods, specifically NAS-Bench-201 and DARTS. By defining flatness metrics such as neighborhoods and loss barriers along paths in architecture space, we reveal locality and flatness characteristics analogous to the well-known properties of neural network loss landscapes in weight space. In particular, we find that highly accurate architectures cluster together in flat regions, while suboptimal architectures remain isolated, unveiling the detailed geometrical structure of the architecture search landscape. Building on these insights, we propose Architecture-Aware Minimization (A$^2$M), a novel analytically derived algorithmic framework that explicitly biases, for the first time, the gradient of differentiable NAS methods towards flat minima in architecture space. A$^2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet16-120, across both NAS-Bench-201 and DARTS search spaces. Notably, A$^2$M is able to increase the test accuracy, on average across different differentiable NAS methods, by +3.60\% on CIFAR-10, +4.60\% on CIFAR-100, and +3.64\% on ImageNet16-120, demonstrating its superior effectiveness in practice. A$^2$M can be easily integrated into existing differentiable NAS frameworks, offering a versatile tool for future research and applications in automated machine learning. We open-source our code at this https URL.
神经架构搜索(NAS)已成为设计有效且高效的神经网络的重要工具。本文研究了在可微分NAS方法中常用的神经结构空间的几何性质,特别是NAS-Bench-201和DARTS。通过定义如邻域和路径上的损失障碍等平坦度指标,我们揭示了局部性和平坦性特性,这些特性类似于权重空间中的神经网络损耗景观所熟知的属性。特别地,我们发现高精度架构在平坦区域中聚集在一起,而次优架构则保持孤立状态,从而揭示了结构搜索景观的详细几何结构。 基于这些见解,我们提出了一个新颖的算法框架——Architecture-Aware Minimization (A$^2$M),该框架首次明确偏置可微分NAS方法的梯度以朝向结构空间中的平坦极小值。在包括CIFAR-10、CIFAR-100和ImageNet16-120在内的基准数据集上,A$^2$M始终优于基于DARTS的方法,在两个搜索空间(NAS-Bench-201和DARTS)中均表现出更佳的泛化能力。值得注意的是,与不同的可微分NAS方法相比,A$^2$M在CIFAR-10上的测试准确性平均提高+3.60%,在CIFAR-100上提高+4.60%,在ImageNet16-120上提高+3.64%。这些结果展示了其在实践中的卓越效果。 A$^2$M可以轻松集成到现有的可微分NAS框架中,为自动机器学习的未来研究和应用提供了一个多功能工具。我们开源了我们的代码:[请在此处插入URL]。
https://arxiv.org/abs/2503.10404
Differentiable Neural Architecture Search (NAS) provides a promising avenue for automating the complex design of deep learning (DL) models. However, current differentiable NAS methods often face constraints in efficiency, operation selection, and adaptability under varying resource limitations. We introduce ZO-DARTS++, a novel NAS method that effectively balances performance and resource constraints. By integrating a zeroth-order approximation for efficient gradient handling, employing a sparsemax function with temperature annealing for clearer and more interpretable architecture distributions, and adopting a size-variable search scheme for generating compact yet accurate architectures, ZO-DARTS++ establishes a new balance between model complexity and performance. In extensive tests on medical imaging datasets, ZO-DARTS++ improves the average accuracy by up to 1.8\% over standard DARTS-based methods and shortens search time by approximately 38.6\%. Additionally, its resource-constrained variants can reduce the number of parameters by more than 35\% while maintaining competitive accuracy levels. Thus, ZO-DARTS++ offers a versatile and efficient framework for generating high-quality, resource-aware DL models suitable for real-world medical applications.
可微神经架构搜索(NAS)为自动化深度学习(DL)模型的复杂设计提供了一条有前景的道路。然而,当前的可微NAS方法在效率、操作选择和适应资源限制变化的能力方面往往面临约束。我们引入了ZO-DARTS++,这是一种新的NAS方法,能够有效地平衡性能与资源限制。通过结合零阶近似以高效处理梯度、使用具有温度退火的稀疏max函数以获得更清晰且更具解释性的架构分布,并采用可变大小搜索方案生成紧凑而准确的架构,ZO-DARTS++在模型复杂性和性能之间建立了新的平衡。 在广泛的医学成像数据集测试中,ZO-DARTS++相比标准DARTS方法平均提高了1.8%的准确性,并将搜索时间缩短了约38.6%。此外,其资源受限变体可以在保持竞争力的同时减少超过35%的参数数量。因此,ZO-DARTS++为生成适合实际医疗应用的高质量、资源感知DL模型提供了一个灵活且高效的框架。
https://arxiv.org/abs/2503.06092