Medical Image Segmentation (MIS) includes diverse tasks, from bone to organ segmentation, each with its own challenges in finding the best segmentation model. The state-of-the-art AutoML-related MIS-framework nnU-Net automates many aspects of model configuration but remains constrained by fixed hyperparameters and heuristic design choices. As a full-AutoML framework for MIS, we propose Auto-nnU-Net, a novel nnU-Net variant enabling hyperparameter optimization (HPO), neural architecture search (NAS), and hierarchical NAS (HNAS). Additionally, we propose Regularized PriorBand to balance model accuracy with the computational resources required for training, addressing the resource constraints often faced in real-world medical settings that limit the feasibility of extensive training procedures. We evaluate our approach across diverse MIS datasets from the well-established Medical Segmentation Decathlon, analyzing the impact of AutoML techniques on segmentation performance, computational efficiency, and model design choices. The results demonstrate that our AutoML approach substantially improves the segmentation performance of nnU-Net on 6 out of 10 datasets and is on par on the other datasets while maintaining practical resource requirements. Our code is available at this https URL.
医学图像分割(MIS)涵盖了从骨骼到器官的各种任务,每种任务在寻找最佳分割模型时都有其独特的挑战。目前最先进的与AutoML相关的MIS框架nnU-Net自动配置了许多模型参数,但仍受限于固定的超参数和启发式设计选择。作为面向MIS的全自动化机器学习(AutoML)框架,我们提出了一种新的nnU-Net变体——Auto-nnU-Net,该框架支持超参数优化(HPO)、神经架构搜索(NAS)以及分层NAS(HNAS)。此外,我们还提出了Regularized PriorBand方法,在保证模型精度的同时考虑训练所需的计算资源,解决了实际医疗环境中由于资源限制而使广泛培训程序不可行的问题。我们在Medical Segmentation Decathlon中一系列多样化的MIS数据集上评估了我们的方法,分析了AutoML技术对分割性能、计算效率以及模型设计选择的影响。结果表明,在10个数据集中,我们的AutoML方法在6个数据集中显著提高了nnU-Net的分割性能,并且在其他四个数据集中的表现与现有方法相当,同时保持实用的资源需求。 我们提出的代码可以在以下网址获取:[请在此处插入链接]。
https://arxiv.org/abs/2505.16561
Integrating Large Language Models (LLMs) and Evolutionary Computation (EC) represents a promising avenue for advancing artificial intelligence by combining powerful natural language understanding with optimization and search capabilities. This manuscript explores the synergistic potential of LLMs and EC, reviewing their intersections, complementary strengths, and emerging applications. We identify key opportunities where EC can enhance LLM training, fine-tuning, prompt engineering, and architecture search, while LLMs can, in turn, aid in automating the design, analysis, and interpretation of ECs. The manuscript explores the synergistic integration of EC and LLMs, highlighting their bidirectional contributions to advancing artificial intelligence. It first examines how EC techniques enhance LLMs by optimizing key components such as prompt engineering, hyperparameter tuning, and architecture search, demonstrating how evolutionary methods automate and refine these processes. Secondly, the survey investigates how LLMs improve EC by automating metaheuristic design, tuning evolutionary algorithms, and generating adaptive heuristics, thereby increasing efficiency and scalability. Emerging co-evolutionary frameworks are discussed, showcasing applications across diverse fields while acknowledging challenges like computational costs, interpretability, and algorithmic convergence. The survey concludes by identifying open research questions and advocating for hybrid approaches that combine the strengths of EC and LLMs.
将大型语言模型(LLM)和进化计算(EC)结合在一起代表了一种有前景的方法,可以通过将强大的自然语言理解与优化和搜索能力相结合来推进人工智能的发展。本文探讨了LLM与EC之间的协同潜力,回顾了它们的交集、互补优势以及新兴应用。我们确定了一些关键机遇,其中EC可以增强LLM的训练、微调、提示工程及架构搜索,而LLM则可以在设计自动化、分析和解释EC方面提供帮助。论文探索了EC和LLM的相互集成方式,强调了它们在推进人工智能方面的双向贡献。首先,本文考察了EC技术如何通过优化关键组件(如提示工程、超参数调优和架构搜索)来增强LLM,展示了进化方法是如何自动化并改进这些过程的。其次,文献调查了LLM如何通过自动设计元启发式算法、调整进化算法以及生成自适应启发法来提高EC的效率和可扩展性。本文还讨论了一些新兴的共生框架,展示它们在各个领域的应用,并且注意到了诸如计算成本、解释性和算法收敛等挑战。最终,文献确定了开放性的研究问题,并倡导采用结合了EC和LLM优势的混合方法。
https://arxiv.org/abs/2505.15741
Spiking Neural Networks (SNNs) are promising biologically plausible models of computation which utilize a spiking binary activation function similar to that of biological neurons. SNNs are well positioned to process spatiotemporal data, and are advantageous in ultra-low power and real-time processing. Despite a large body of work on conventional artificial neural network accelerators, much less attention has been given to efficient SNN hardware accelerator design. In particular, SNNs exhibit inherent unstructured spatial and temporal firing sparsity, an opportunity yet to be fully explored for great hardware processing efficiency. In this work, we propose a novel systolic-array SNN accelerator architecture, called SpikeX, to take on the challenges and opportunities stemming from unstructured sparsity while taking into account the unique characteristics of spike-based computation. By developing an efficient dataflow targeting expensive multi-bit weight data movements, SpikeX reduces memory access and increases data sharing and hardware utilization for computations spanning across both time and space, thereby significantly improving energy efficiency and inference latency. Furthermore, recognizing the importance of SNN network and hardware co-design, we develop a co-optimization methodology facilitating not only hardware-aware SNN training but also hardware accelerator architecture search, allowing joint network weight parameter optimization and accelerator architectural reconfiguration. This end-to-end network/accelerator co-design approach offers a significant reduction of 15.1x-150.87x in energy-delay-product(EDP) without comprising model accuracy.
脉冲神经网络(SNNs)是一种具有生物学合理性的计算模型,其利用类似于生物神经元的尖峰二进制激活函数。SNNs非常适合处理时空数据,并在超低功耗和实时处理方面表现出优势。尽管传统的人工神经网络加速器已有大量研究工作,但针对高效SNN硬件加速器设计的关注较少。特别是,SNN们展示了固有的、无结构的空间和时间放电稀疏性,这是一个尚未被完全探索的用于大幅提升硬件处理效率的机会。 在这项工作中,我们提出了一种新颖的用于应对未结构化稀疏性的挑战与机遇的脉冲神经网络加速器架构,称为SpikeX。该架构考虑了基于尖峰计算的独特特性,通过开发一种高效的数据流来减少昂贵的多比特权重数据移动,从而减少了内存访问并增加了跨时间和空间的数据共享和硬件利用率,这显著提高了能效和推理延迟。 此外,考虑到脉冲神经网络与硬件协同设计的重要性,我们还发展了一种协同优化方法论。该方法不仅支持硬件感知的SNN训练,而且还支持硬件加速器架构搜索,从而实现联合网络权重参数优化和加速器结构重构。这种端到端的网络/加速器协同设计方法能够在不牺牲模型精度的前提下,将能源延迟产品(EDP)降低15.1倍至150.87倍。 通过这种方式,SpikeX为脉冲神经网络硬件加速提供了新的视角,并展示了在实际应用中实现高效计算的巨大潜力。
https://arxiv.org/abs/2505.12292
Estimating the network performance using zero-cost (ZC) metrics has proven both its efficiency and efficacy in Neural Architecture Search (NAS). However, a notable limitation of most ZC proxies is their inconsistency, as reflected by the substantial variation in their performance across different problems. Furthermore, the design of existing ZC metrics is manual, involving a time-consuming trial-and-error process that requires substantial domain expertise. These challenges raise two critical questions: (1) Can we automate the design of ZC metrics? and (2) Can we utilize the existing hand-crafted ZC metrics to synthesize a more generalizable one? In this study, we propose a framework based on Symbolic Regression via Genetic Programming to automate the design of ZC metrics. Our framework is not only highly extensible but also capable of quickly producing a ZC metric with a strong positive rank correlation to true network performance across diverse NAS search spaces and tasks. Extensive experiments on 13 problems from NAS-Bench-Suite-Zero demonstrate that our automatically generated proxies consistently outperform hand-crafted alternatives. Using our evolved proxy metric as the search objective in an evolutionary algorithm, we could identify network architectures with competitive performance within 15 minutes using a single consumer GPU.
利用零成本(Zero-Cost,ZC)度量估算网络性能在神经架构搜索(NAS)中已被证明具有高效性和有效性。然而,大多数ZC代理的一个显著限制是其不一致性,这体现在它们在不同问题上的表现波动较大上。此外,现有ZC度量的设计过程是手动完成的,并且涉及耗时的试错过程,需要大量领域专业知识。这些问题提出了两个关键问题:(1)我们能否自动化设计ZC度量?(2)我们是否可以利用现有的手工制作的ZC度量来合成一个更具有普适性的度量? 在本研究中,我们提出了一种基于遗传编程进行符号回归的框架,用于自动设计ZC度量。我们的框架不仅高度可扩展,而且能够快速生成一种零成本度量,在各种NAS搜索空间和任务上与真实网络性能之间具有强烈的正向等级相关性。 通过在来自NAS-Bench-Suite-Zero套件中的13个问题上的广泛实验表明,我们自动产生的代理始终优于手工设计的替代方案。使用我们的演化出来的代理指标作为进化算法中的搜索目标,在单个消费级GPU上仅需15分钟即可识别出具有竞争力性能的网络架构。
https://arxiv.org/abs/2505.15832
The growing use of smartphones and IoT devices necessitates efficient time-series analysis on resource-constrained hardware, which is critical for sensing applications such as human activity recognition and air quality prediction. Recent efforts in hardware-aware neural architecture search (NAS) automate architecture discovery for specific platforms; however, none focus on general time-series analysis with edge deployment. Leveraging the problem-solving and reasoning capabilities of large language models (LLM), we propose MONAQ, a novel framework that reformulates NAS into Multi-Objective Neural Architecture Querying tasks. MONAQ is equipped with multimodal query generation for processing multimodal time-series inputs and hardware constraints, alongside an LLM agent-based multi-objective search to achieve deployment-ready models via code generation. By integrating numerical data, time-series images, and textual descriptions, MONAQ improves an LLM's understanding of time-series data. Experiments on fifteen datasets demonstrate that MONAQ-discovered models outperform both handcrafted models and NAS baselines while being more efficient.
智能手机和物联网设备的广泛使用要求在资源受限的硬件上进行高效的时间序列分析,这对于诸如人类活动识别和空气质量预测等传感应用至关重要。最近关于硬件感知神经架构搜索(NAS)的努力已经实现了针对特定平台的自动化架构发现,然而没有一种方法专注于边缘部署的一般时间序列分析。通过利用大型语言模型(LLM)的问题解决和推理能力,我们提出了MONAQ这一新框架,它将NAS重新定义为多目标神经架构查询任务。MONAQ配备了多模态查询生成功能,用于处理多模态时间序列输入和硬件约束,并且采用基于LLM代理的多目标搜索来通过代码生成实现可以直接部署的模型。通过整合数值数据、时间序列图像和文本描述,MONAQ提升了LLM对时间序列数据的理解能力。在十五个数据集上的实验表明,与手工设计的模型和NAS基线相比,MONAQ发现的模型不仅性能更优,而且更加高效。
https://arxiv.org/abs/2505.10607
Incremental learning is a machine learning paradigm where a model learns from a sequential stream of tasks. This setting poses a key challenge: balancing plasticity (learning new tasks) and stability (preserving past knowledge). Neural Architecture Search (NAS), a branch of AutoML, automates the design of the architecture of Deep Neural Networks and has shown success in static settings. However, existing NAS-based approaches to incremental learning often rely on expanding the model at every task, making them impractical in resource-constrained environments. In this work, we introduce SEAL, a NAS-based framework tailored for data-incremental learning, a scenario where disjoint data samples arrive sequentially and are not stored for future access. SEAL adapts the model structure dynamically by expanding it only when necessary, based on a capacity estimation metric. Stability is preserved through cross-distillation training after each expansion step. The NAS component jointly searches for both the architecture and the optimal expansion policy. Experiments across multiple benchmarks demonstrate that SEAL effectively reduces forgetting and enhances accuracy while maintaining a lower model size compared to prior methods. These results highlight the promise of combining NAS and selective expansion for efficient, adaptive learning in incremental scenarios.
增量学习是一种机器学习范式,其中模型通过一系列任务的顺序流进行学习。这种设置提出了一个关键挑战:在可塑性(学习新任务)和稳定性(保留过去的知识)之间取得平衡。神经架构搜索(NAS),作为自动机器学习(AutoML)的一个分支,自动化了深度神经网络架构的设计,并在静态环境中展示了成功的结果。然而,现有的基于NAS的方法对于增量学习通常依赖于每完成一个任务就扩展模型,这使得它们在资源受限的环境中变得不切实际。在这项工作中,我们引入了SEAL,这是一种为数据增量学习量身定制的基于NAS的框架,在这种情况下,离散的数据样本会顺序到达并且不会被存储以供将来访问。SEAL通过动态调整其架构来适应模型结构,并仅在必要时根据容量估计指标进行扩展。稳定性则通过每次扩展后的跨蒸馏训练得以保持。该NAS组件同时搜索架构和最优的扩展策略。在多个基准上的实验表明,与先前的方法相比,SEAL有效地减少了遗忘效应并提高了准确性,同时维持了更小的模型尺寸。这些结果突显了结合NAS和选择性扩展以实现增量场景中高效、适应性学习的潜力。
https://arxiv.org/abs/2505.10457
Variational quantum algorithms hold the promise to address meaningful quantum problems already on noisy intermediate-scale quantum hardware, but they face the challenge of designing quantum circuits that both solve the target problem and comply with device limitations. Quantum architecture search (QAS) automates this design process, with reinforcement learning (RL) emerging as a promising approach. Yet, RL-based QAS methods encounter significant scalability issues, as computational and training costs grow rapidly with the number of qubits, circuit depth, and noise, severely impacting performance. To address these challenges, we introduce $\textit{TensorRL-QAS}$, a scalable framework that combines tensor network (TN) methods with RL for designing quantum circuits. By warm-starting the architecture search with a matrix product state approximation of the target solution, TensorRL-QAS effectively narrows the search space to physically meaningful circuits, accelerating convergence to the desired solution. Tested on several quantum chemistry problems of up to 12-qubit, TensorRL-QAS achieves up to a 10-fold reduction in CNOT count and circuit depth compared to baseline methods, while maintaining or surpassing chemical accuracy. It reduces function evaluations by up to 100-fold, accelerates training episodes by up to $98\%$, and achieves up to $50\%$ success probability for 10-qubit systems-far exceeding the $<1\%$ rates of baseline approaches. Robustness and versatility are demonstrated both in the noiseless and noisy scenarios, where we report a simulation of up to 8-qubit. These advancements establish TensorRL-QAS as a promising candidate for a scalable and efficient quantum circuit discovery protocol on near-term quantum hardware.
变分量子算法有望在嘈杂的中等规模量子硬件上解决有意义的量子问题,但它们面临着设计既能解决问题又能符合设备限制的量子电路这一挑战。量子架构搜索(QAS)自动化了这个设计过程,强化学习(RL)作为一种有前景的方法正在兴起。然而,基于RL的QAS方法遇到了显著的可扩展性问题,随着量子比特数量、电路深度和噪声水平的增长,计算和训练成本急剧上升,严重影响了性能。为了解决这些挑战,我们引入了**TensorRL-QAS**,这是一个结合张量网络(TN)方法与强化学习来设计量子电路的可扩展框架。 通过使用目标解的矩阵乘积态近似进行架构搜索的热启动,TensorRL-QAS有效地缩小了物理意义明确的电路的搜索空间,加速收敛到所需解决方案。在几个多达12个量子比特的量子化学问题上测试后,与基准方法相比,TensorRL-QAS实现了高达十倍的CNOT门计数和电路深度减少,同时保持或超越了化学精度。它减少了功能评估次数高达一百倍,训练轮次加速至多98%,并为10个量子比特系统达到了50%的成功概率——远超基准方法不到1%的比率。 在无噪声与有噪声的情景中均证明了其鲁棒性和灵活性,在这些情景下,我们报告了多达8个量子比特的模拟。这些进展使TensorRL-QAS成为未来近期内用于大规模和高效发现量子电路协议的一个很有前景的选择。
https://arxiv.org/abs/2505.09371
Determining the performance of a Deep Neural Network during Neural Architecture Search processes is essential for identifying optimal architectures and hyperparameters. Traditionally, this process requires training and evaluation of each network, which is time-consuming and resource-intensive. Zero-cost proxies estimate performance without training, serving as an alternative to traditional training. However, recent proxies often lack generalization across diverse scenarios and provide only relative rankings rather than predicted accuracies. To address these limitations, we propose GreenFactory, an ensemble of zero-cost proxies that leverages a random forest regressor to combine multiple predictors' strengths and directly predict model test accuracy. We evaluate GreenFactory on NATS-Bench, achieving robust results across multiple datasets. Specifically, GreenFactory achieves high Kendall correlations on NATS-Bench-SSS, indicating substantial agreement between its predicted scores and actual performance: 0.907 for CIFAR-10, 0.945 for CIFAR-100, and 0.920 for ImageNet-16-120. Similarly, on NATS-Bench-TSS, we achieve correlations of 0.921 for CIFAR-10, 0.929 for CIFAR-100, and 0.908 for ImageNet-16-120, showcasing its reliability in both search spaces.
在神经架构搜索过程中,确定深层神经网络的性能对于识别最优架构和超参数至关重要。传统的方法需要对每个网络进行训练和评估,这既耗时又耗费资源。零成本代理(zero-cost proxies)通过不需训练就能估计性能来作为替代方案,但最近的一些代理方法在不同场景下缺乏泛化能力,并且只能提供相对排名而非预测准确率。为了解决这些问题,我们提出了GreenFactory,这是一个结合了多种零成本代理的集合体,利用随机森林回归器将多个预测器的优势整合起来,并直接预测模型的测试准确性。 我们在NATS-Bench上对GreenFactory进行了评估,在多数据集上取得了稳健的结果。具体来说,在NATS-Bench-SSS中,GreenFactory在Kendall等级相关系数方面表现优异:CIFAR-10为0.907,CIFAR-100为0.945,ImageNet-16-120为0.920。同样地,在NATS-Bench-TSS中,我们达到了以下相关性结果:CIFAR-10为0.921,CIFAR-100为0.929,ImageNet-16-120为0.908,这表明GreenFactory在不同的搜索空间内都具有可靠的预测性能。
https://arxiv.org/abs/2505.09344
In this paper, we propose a novel attention module termed the Differentiable Channel Selection Attention module, or the DCS-Attention module. In contrast with conventional self-attention, the DCS-Attention module features selection of informative channels in the computation of the attention weights. The selection of the feature channels is performed in a differentiable manner, enabling seamless integration with DNN training. Our DCS-Attention is compatible with either fixed neural network backbones or learnable backbones with Differentiable Neural Architecture Search (DNAS), leading to DCS with Fixed Backbone (DCS-FB) and DCS-DNAS, respectively. Importantly, our DCS-Attention is motivated by the principle of Information Bottleneck (IB), and a novel variational upper bound for the IB loss, which can be optimized by SGD, is derived and incorporated into the training loss of the networks with the DCS-Attention modules. In this manner, a neural network with DCS-Attention modules is capable of selecting the most informative channels for feature extraction so that it enjoys state-of-the-art performance for the Re-ID task. Extensive experiments on multiple person Re-ID benchmarks using both DCS-FB and DCS-DNAS show that DCS-Attention significantly enhances the prediction accuracy of DNNs for person Re-ID, which demonstrates the effectiveness of DCS-Attention in learning discriminative features critical to identifying person identities. The code of our work is available at this https URL.
在这篇论文中,我们提出了一种称为可微通道选择注意力模块(Differentiable Channel Selection Attention module,简称DCS-Attention模块)的新型注意机制。与传统的自注意力机制不同,DCS-Attention模块在计算注意力权重时具有选择信息量丰富的通道的功能。该通道的选择过程是以一种可微分的方式进行的,从而能够无缝地集成到深度神经网络(DNN)的训练过程中。 我们的DCS-Attention模块既可以应用于固定结构的神经网络基础架构,也可以应用于通过可微神经体系结构搜索(Differentiable Neural Architecture Search, DNAS)学习得到的基础架构,分别称为具有固定骨干网的DCS (DCS-FB)和使用DNAS的DCS-DNAS。尤为重要的是,我们的DCS-Attention模块是基于信息瓶颈原则(Information Bottleneck, IB)设计的,并且我们还推导出一种新的适用于IB损失的变分上界,这个上界可以通过随机梯度下降(SGD)进行优化并集成到包含DCS-Attention模块的网络训练过程中。通过这种方式,在使用DCS-Attention模块的神经网络中,可以选出用于特征提取的信息量最大的通道,从而为重识别(Re-ID)任务提供最先进的性能。 我们在多个人员重识别基准数据集上进行了广泛实验,使用了既包括固定骨干网的DCS-FB也包括通过DNAS学习得到骨干网的DCS-DNAS方法,结果显示,DCS-Attention模块显著提高了深度神经网络在人员重识别中的预测准确性。这证明了DCS-Attention机制在学习区分性特征以准确辨别人的身份方面是非常有效的。 我们的工作代码可以在提供的链接处获取。
https://arxiv.org/abs/2505.08961
The rapid advancements in quantum computing (QC) and machine learning (ML) have led to the emergence of quantum machine learning (QML), which integrates the strengths of both fields. Among QML approaches, variational quantum circuits (VQCs), also known as quantum neural networks (QNNs), have shown promise both empirically and theoretically. However, their broader adoption is hindered by reliance on quantum hardware during inference. Hardware imperfections and limited access to quantum devices pose practical challenges. To address this, the Quantum-Train (QT) framework leverages the exponential scaling of quantum amplitudes to generate classical neural network parameters, enabling inference without quantum hardware and achieving significant parameter compression. Yet, designing effective quantum circuit architectures for such quantum-enhanced neural programmers remains non-trivial and often requires expertise in quantum information science. In this paper, we propose an automated solution using differentiable optimization. Our method jointly optimizes both conventional circuit parameters and architectural parameters in an end-to-end manner via automatic differentiation. We evaluate the proposed framework on classification, time-series prediction, and reinforcement learning tasks. Simulation results show that our method matches or outperforms manually designed QNN architectures. This work offers a scalable and automated pathway for designing QNNs that can generate classical neural network parameters across diverse applications.
量子计算(QC)和机器学习(ML)的快速进展催生了量子机器学习(QML),该领域融合了这两个领域的优势。在QML的方法中,变分量子电路(VQCs),也被称为量子神经网络(QNNs),已经在理论上和实证上显示出巨大的潜力。然而,由于推理过程中对量子硬件的依赖,它们的大规模应用受到了限制。量子设备中的硬件缺陷以及获取这些设备的机会有限,给实际操作带来了挑战。为了解决这个问题,Quantum-Train (QT) 框架利用了量子幅度的指数级扩展来生成经典神经网络参数,从而使推理过程中无需使用量子硬件,并实现了显著的参数压缩。然而,设计有效的量子电路架构以支持这种增强型的经典神经程序仍然非常具有挑战性,通常需要量子信息科学的专业知识。 在本文中,我们提出了一种利用可微分优化的自动化解决方案。我们的方法通过自动求导,在端到端的方式下同时优化常规电路参数和架构参数。我们在分类、时间序列预测以及强化学习任务上对提出的框架进行了评估。模拟结果显示,我们的方法与手动设计的QNN架构相当或优于它们。这项工作提供了一种可扩展且自动化的方法来设计能够生成适用于各种应用的经典神经网络参数的量子神经网络(QNN)。
https://arxiv.org/abs/2505.09653
Despite the growing interest in Explainable Artificial Intelligence (XAI), explainability is rarely considered during hyperparameter tuning or neural architecture optimization, where the focus remains primarily on minimizing predictive loss. In this work, we introduce the novel concept of XAI consistency, defined as the agreement among different feature attribution methods, and propose new metrics to quantify it. For the first time, we integrate XAI consistency directly into the hyperparameter tuning objective, creating a multi-objective optimization framework that balances predictive performance with explanation robustness. Implemented within the Sequential Parameter Optimization Toolbox (SPOT), our approach uses both weighted aggregation and desirability-based strategies to guide model selection. Through our proposed framework and supporting tools, we explore the impact of incorporating XAI consistency into the optimization process. This enables us to characterize distinct regions in the architecture configuration space: one region with poor performance and comparatively low interpretability, another with strong predictive performance but weak interpretability due to low \gls{xai} consistency, and a trade-off region that balances both objectives by offering high interpretability alongside competitive performance. Beyond introducing this novel approach, our research provides a foundation for future investigations into whether models from the trade-off zone-balancing performance loss and XAI consistency-exhibit greater robustness by avoiding overfitting to training performance, thereby leading to more reliable predictions on out-of-distribution data.
尽管在可解释的人工智能(XAI)方面的兴趣日益增长,但在超参数调整或神经架构优化过程中,很少考虑到可解释性问题,这些过程仍然主要集中在最小化预测损失上。在这项工作中,我们引入了新颖的“XAI一致性”概念,定义为不同特征归因方法之间的一致性,并提出了新的度量标准来量化这种一致性。首次我们将XAI一致性直接整合到超参数调优目标中,创建了一个平衡预测性能与解释稳健性的多目标优化框架。该框架在顺序参数优化工具箱(SPOT)内实现,使用加权聚合和基于偏好策略指导模型选择。 通过我们提出的框架和支持工具,我们探索了将XAI一致性纳入优化过程的影响。这使我们能够表征架构配置空间中的不同区域:一个性能较差且相对解释性较低的区域;另一个具有强大预测性能但解释性较弱的区域,因为这里的XAI一致性较低;以及在解释性和竞争性表现之间达到平衡的一个折中区域。 除了介绍这种方法之外,我们的研究还为未来调查奠定了基础,探讨了那些位于折衷区域中的模型是否通过避免过度拟合训练数据性能,在保持预测损失和XAI一致性之间的平衡时,展示出更强的稳健性,并因此在处理分布外数据上提供更可靠的预测。
https://arxiv.org/abs/2505.07910
Training-free Neural Architecture Search (NAS) efficiently identifies high-performing neural networks using zero-cost (ZC) proxies. Unlike multi-shot and one-shot NAS approaches, ZC-NAS is both (i) time-efficient, eliminating the need for model training, and (ii) interpretable, with proxy designs often theoretically grounded. Despite rapid developments in the field, current SOTA ZC proxies are typically constrained to well-established convolutional search spaces. With the rise of Large Language Models shaping the future of deep learning, this work extends ZC proxy applicability to Vision Transformers (ViTs). We present a new benchmark using the Autoformer search space evaluated on 6 distinct tasks and propose Layer-Sample Wise Activation with Gradients information (L-SWAG), a novel, generalizable metric that characterizes both convolutional and transformer architectures across 14 tasks. Additionally, previous works highlighted how different proxies contain complementary information, motivating the need for a ML model to identify useful combinations. To further enhance ZC-NAS, we therefore introduce LIBRA-NAS (Low Information gain and Bias Re-Alignment), a method that strategically combines proxies to best represent a specific benchmark. Integrated into the NAS search, LIBRA-NAS outperforms evolution and gradient-based NAS techniques by identifying an architecture with a 17.0% test error on ImageNet1k in just 0.1 GPU days.
无训练神经架构搜索(NAS)能够高效地识别高性能的神经网络,而无需使用任何模型训练。与多次射击和单次射击的NAS方法不同,零成本(ZC)NAS具有两个优点:一是时间效率高,不需要进行模型训练;二是可解释性强,代理设计通常有坚实的理论基础。尽管该领域的发展迅速,但当前最先进的ZC代理大多局限于成熟的卷积搜索空间内。随着大型语言模型塑造深度学习的未来,这项工作将ZC代理的应用范围扩展到了视觉变压器(ViTs)。我们提出了一项新的基准测试,采用Autoformer搜索空间,并在6个不同的任务上进行评估。此外,我们还提出了Layer-Sample Wise Activation with Gradients信息(L-SWAG),这是一种全新的、可推广的度量标准,能够表征卷积和转换器架构之间的关系,涵盖了14项任务。 先前的研究强调了不同代理之间互补信息的重要性,并指出需要机器学习模型来识别有用的组合。为了进一步提高ZC-NAS的性能,我们引入了LIBRA-NAS(低信息增益与偏差再对准)方法,该方法战略性地结合各种代理以最佳方式表示特定基准测试。在NAS搜索中集成LIBRA-NAS后,在仅使用0.1个GPU天的时间里,它就能识别出一个在ImageNet1k上具有17.0%测试误差的架构,并且其性能超过了基于进化和梯度的方法。
https://arxiv.org/abs/2505.07300
Domain generalization in image classification is a crucial challenge, with models often failing to generalize well across unseen datasets. We address this issue by introducing a neuro-inspired Neural Response Normalization (NeuRN) layer which draws inspiration from neurons in the mammalian visual cortex, which aims to enhance the performance of deep learning architectures on unseen target domains by training deep learning models on a source domain. The performance of these models is considered as a baseline and then compared against models integrated with NeuRN on image classification tasks. We perform experiments across a range of deep learning architectures, including ones derived from Neural Architecture Search and Vision Transformer. Additionally, in order to shortlist models for our experiment from amongst the vast range of deep neural networks available which have shown promising results, we also propose a novel method that uses the Needleman-Wunsch algorithm to compute similarity between deep learning architectures. Our results demonstrate the effectiveness of NeuRN by showing improvement against baseline in cross-domain image classification tasks. Our framework attempts to establish a foundation for future neuro-inspired deep learning models.
领域泛化在图像分类中是一个关键的挑战,模型往往难以很好地推广到未见过的数据集上。为了解决这个问题,我们引入了一种神经启发式的神经响应归一化(NeuRN)层,这种层借鉴了哺乳动物视觉皮质中的神经元特性,旨在通过在源数据域上训练深度学习模型来增强这些模型在目标领域的表现。我们将不使用NeuRN的模型性能作为基准,并与集成NeuRN的模型在图像分类任务上的表现进行比较。 我们在一系列不同的深度学习架构中进行了实验,包括来自神经结构搜索和视觉变换器方法的网络。为了从众多表现出色的深层神经网络中挑选出适合我们实验的模型,我们还提出了一种新颖的方法,利用Needleman-Wunsch算法来计算不同深度学习架构之间的相似性。 我们的研究结果展示了NeuRN的有效性,在跨域图像分类任务上显著改善了基准性能。本框架旨在为未来基于神经科学启发式的深度学习模型奠定基础。
https://arxiv.org/abs/2505.06881
Underwater object detection using sonar imagery has become a critical and rapidly evolving research domain within marine technology. However, sonar images are characterized by lower resolution and sparser features compared to optical images, which seriously degrades the performance of object this http URL address these challenges, we specifically propose a Detection Transformer (DETR) architecture optimized with a Neural Architecture Search (NAS) approach called NAS-DETR for object detection in sonar images. First, an improved Zero-shot Neural Architecture Search (NAS) method based on the maximum entropy principle is proposed to identify a real-time, high-representational-capacity CNN-Transformer backbone for sonar image detection. This method enables the efficient discovery of high-performance network architectures with low computational and time overhead. Subsequently, the backbone is combined with a Feature Pyramid Network (FPN) and a deformable attention-based Transformer decoder to construct a complete network architecture. This architecture integrates various advanced components and training schemes to enhance overall performance. Extensive experiments demonstrate that this architecture achieves state-of-the-art performance on two Representative datasets, while maintaining minimal overhead in real-time efficiency and computational complexity. Furthermore, correlation analysis between the key parameters and differential entropy-based fitness function is performed to enhance the interpretability of the proposed framework. To the best of our knowledge, this is the first work in the field of sonar object detection to integrate the DETR architecture with a NAS search mechanism.
使用声呐图像进行水下物体检测已成为海洋技术领域中一个关键且迅速发展的研究方向。然而,与光学图像相比,声呐图像的分辨率较低、特征较稀疏,这严重影响了目标识别性能。为了解决这些挑战,我们特别提出了一种基于神经架构搜索(NAS)优化的方法——称为NAS-DETR的目标检测框架,用于声呐图像中的物体检测。首先,提出了一种改进的零样本神经架构搜索(NAS)方法,该方法基于最大熵原则,旨在识别适用于声呐图像检测的实时、高表示能力的CNN-Transformer骨干网络。这种方法能够有效地发现高性能网络结构,并且具有较低的计算和时间开销。 随后,将此骨干网络与特征金字塔网络(FPN)以及基于可变形注意力机制的Transformer解码器相结合,构建了一个完整的网络架构。该架构集成了各种先进组件和训练方案,以提升整体性能。广泛的实验表明,在两个代表性数据集上,该架构实现了最先进的性能,并且在实时效率和计算复杂度方面保持了极低的开销。 此外,还进行了关键参数与基于差分熵的适应性函数之间的相关性分析,以增强所提出框架的可解释性。据我们所知,这是首次将DETR架构与NAS搜索机制结合用于声呐物体检测领域的研究工作。
https://arxiv.org/abs/2505.06694
We propose an automated framework for quantum circuit design by integrating large-language models (LLMs) with evolutionary optimization to overcome the rigidity, scalability limitations, and expert dependence of traditional ones in variational quantum algorithms. Our approach (FunSearch) autonomously discovers hardware-efficient ansätze with new features of scalability and system-size-independent number of variational parameters entirely from scratch. Demonstrations on the Ising and XY spin chains with n = 9 qubits yield circuits containing 4 parameters, achieving near-exact energy extrapolation across system sizes. Implementations on quantum hardware (Zuchongzhi chip) validate practicality, where two-qubit quantum gate noises can be effectively mitigated via zero-noise extrapolations for a spin chain system as large as 20 sites. This framework bridges algorithmic design and experimental constraints, complementing contemporary quantum architecture search frameworks to advance scalable quantum simulations.
我们提出了一种自动化框架,通过将大型语言模型(LLM)与进化优化相结合来设计量子电路。这一方法旨在克服传统变分量子算法在灵活性、可扩展性以及对专家依赖方面的局限性。我们的方法(FunSearch)能够在没有任何先验知识的情况下自主发现硬件效率高的解空间,并且能够生成具有新特性的可扩展性和系统大小无关的变分参数数量。 在含有n=9个量子比特的伊辛模型和XY自旋链上的演示显示,这种方法可以设计出仅包含4个参数的电路,能够在不同系统规模下实现近似精确的能量外推。通过在量子硬件(如祖冲之芯片)上实施这一框架进行验证,我们展示了其实际可行性:即使是多达20位的自旋链系统中的双比特量子门噪声也可以通过零噪音外推有效缓解。 该框架将算法设计与实验约束相结合,补充了现有的量子架构搜索框架,并为实现可扩展的量子模拟提供了新的途径。
https://arxiv.org/abs/2505.06347
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications, integrating cloud resources with edge devices to enable efficient, low-latency processing. Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems, yet introduce significant challenges in model deployment and resource management. In this survey, we comprehensive examine the intersection of distributed intelligence and model optimization within edge-cloud environments, providing a structured tutorial on fundamental architectures, enabling technologies, and emerging applications. Additionally, we systematically analyze model optimization approaches, including compression, adaptation, and neural architecture search, alongside AI-driven resource management strategies that balance performance, energy efficiency, and latency requirements. We further explore critical aspects of privacy protection and security enhancement within ECCC systems and examines practical deployments through diverse applications, spanning autonomous driving, healthcare, and industrial automation. Performance analysis and benchmarking techniques are also thoroughly explored to establish evaluation standards for these complex systems. Furthermore, the review identifies critical research directions including LLMs deployment, 6G integration, neuromorphic computing, and quantum computing, offering a roadmap for addressing persistent challenges in heterogeneity management, real-time processing, and scalability. By bridging theoretical advancements and practical deployments, this survey offers researchers and practitioners a holistic perspective on leveraging AI to optimize distributed computing environments, fostering innovation in next-generation intelligent systems.
边缘-云协同计算(ECCC)已成为解决现代智能应用计算需求的关键范式,通过将云计算资源与边缘设备相结合,实现了高效、低延迟的处理。近年来,在AI领域,尤其是深度学习和大型语言模型(LLMs)方面的进展显著提升了这些分布式系统的性能,但同时也带来了模型部署和资源管理等方面的巨大挑战。本文综述全面考察了分布式智能与模型优化在边缘-云环境中的交汇点,并提供了关于基础架构、赋能技术及新兴应用的系统化教程。此外,我们还详细分析了包括压缩、适应性和神经结构搜索在内的模型优化方法以及驱动AI资源管理策略,这些策略平衡性能、能效和延迟要求。同时探讨了ECCC系统中隐私保护和安全增强的关键方面,并通过自动驾驶、医疗保健及工业自动化等多样化应用的实例来审视实际部署情况。本文还深入探索了性能分析与基准测试技术,以建立对这些复杂系统的评估标准。 此外,综述确定了一系列关键研究方向,包括LLMs部署、6G集成、神经形态计算和量子计算,并提供了应对异构管理、实时处理及可扩展性持续挑战的路线图。通过连接理论进展与实际应用,本调查为研究人员和实践者提供了一个全面视角,利用AI优化分布式计算环境,推动下一代智能系统的创新。 综上所述,本文旨在向读者展示边缘-云协同计算领域中当前的研究趋势、技术进步及未来发展方向,并为该领域的进一步研究提供了宝贵的指导。
https://arxiv.org/abs/2505.01821
This paper presents a neural architecture search method based on Transformer architecture, searching cross multihead attention computation ways for different number of encoder and decoder combinations. In order to search for neural network structures with better translation results, we considered perplexity as an auxiliary evaluation metric for the algorithm in addition to BLEU scores and iteratively improved each individual neural network within the population by a multi-objective genetic algorithm. Experimental results show that the neural network structures searched by the algorithm outperform all the baseline models, and that the introduction of the auxiliary evaluation metric can find better models than considering only the BLEU score as an evaluation metric.
本文提出了一种基于Transformer架构的神经架构搜索方法,该方法探索了不同编码器和解码器组合数量下的多头注意力计算方式。为了寻找翻译效果更好的神经网络结构,在算法中除了使用BLEU分数之外,还考虑了困惑度(perplexity)作为辅助评估指标,并通过一个多目标遗传算法迭代优化种群中的每个个体神经网络。实验结果表明,该算法搜索到的神经网络结构优于所有基线模型,而且引入辅助评估指标能够找到比仅依据BLEU分数评估更好的模型。
https://arxiv.org/abs/2505.01314
This paper proposes a neural architecture search space using ResNet as a framework, with search objectives including parameters for convolution, pooling, fully connected layers, and connectivity of the residual network. In addition to recognition accuracy, this paper uses the loss value on the validation set as a secondary objective for optimization. The experimental results demonstrate that the search space of this paper together with the optimisation approach can find competitive network architectures on the MNIST, Fashion-MNIST and CIFAR100 datasets.
本文提出了一种基于ResNet框架的神经架构搜索空间,其中搜索目标包括卷积层、池化层和全连接层的参数以及残差网络的连通性。除了识别准确率之外,该论文还使用验证集上的损失值作为优化的次要目标。实验结果表明,本文提出的搜索空间结合优化方法能够在MNIST、Fashion-MNIST和CIFAR100数据集上找到具有竞争力的网络架构。
https://arxiv.org/abs/2505.01313
The environmental impact of Artificial Intelligence (AI) is emerging as a significant global concern, particularly regarding model training. In this paper, we introduce GREEN (Guided Recommendations of Energy-Efficient Networks), a novel, inference-time approach for recommending Pareto-optimal AI model configurations that optimize validation performance and energy consumption across diverse AI domains and tasks. Our approach directly addresses the limitations of current eco-efficient neural architecture search methods, which are often restricted to specific architectures or tasks. Central to this work is EcoTaskSet, a dataset comprising training dynamics from over 1767 experiments across computer vision, natural language processing, and recommendation systems using both widely used and cutting-edge architectures. Leveraging this dataset and a prediction model, our approach demonstrates effectiveness in selecting the best model configuration based on user preferences. Experimental results show that our method successfully identifies energy-efficient configurations while ensuring competitive performance.
人工智能(AI)对环境的影响正成为一个重要的全球性关注点,特别是在模型训练方面。本文介绍了GREEN(Guided Recommendations of Energy-Efficient Networks),这是一种新颖的推理时间方法,用于推荐帕累托最优的人工智能模型配置,以优化跨多种AI领域和任务的有效性和能耗。我们提出的方法直接解决了当前生态高效的神经架构搜索方法的局限性,这些方法通常局限于特定的架构或任务。本研究的核心是EcoTaskSet数据集,该数据集包含来自超过1767个实验的数据,涵盖了计算机视觉、自然语言处理以及推荐系统等领域的训练动态,并使用了广泛使用的和尖端的架构。通过利用此数据集和一个预测模型,我们的方法能够根据用户偏好有效地选择最佳模型配置。实验结果表明,我们的方法成功地识别出节能配置的同时还能保证竞争力的表现。
https://arxiv.org/abs/2505.01468
We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.
我们介绍了Llama-Nemotron系列模型,这是一个开放的异构推理模型家族,它们提供了卓越的推理能力、推断效率,并且具有企业级使用的开放式许可。该系列包括三种规模——Nano(80亿参数)、Super(490亿参数)和Ultra(2530亿参数),在与DeepSeek-R1等最先进的推理模型竞争的同时,提供更出色的推断吞吐量和内存效率。在这份报告中,我们将讨论这些模型的训练过程,包括使用来自Llama 3模型的神经架构搜索以加速推断、知识蒸馏以及持续预训练,随后是一个重点在于推理的后期训练阶段,该阶段主要由监督微调和大规模强化学习两大部分组成。Llama-Nemotron模型是第一个支持动态推理切换功能的开源模型,允许用户在推断过程中在标准聊天模式与推理模式之间进行切换。 为了进一步支持开放研究并促进模型开发,我们提供了以下资源: 1. 我们将Llama-Nemotron推理模型(LN-Nano、LN-Super和LN-Ultra)以商业许可友好的NVIDIA开源模型许可协议发布。 2. 我们发布了完整的后期训练数据集:Llama-Nemotron-Post-Training-Dataset。 3. 我们还发布了我们的训练代码库:NeMo,NeMo-Aligner以及Megatron-LM。
https://arxiv.org/abs/2505.00949