Neural Architecture Search (NAS) has emerged as a powerful approach for automating neural network design. However, existing NAS methods face critical limitations in real-world deployments: architectures lack adaptability across scenarios, each deployment context requires costly separate searches, and performance consistency across diverse platforms remains challenging. We propose DANCE (Dynamic Architectures with Neural Continuous Evolution), which reformulates architecture search as a continuous evolution problem through learning distributions over architectural components. DANCE introduces three key innovations: a continuous architecture distribution enabling smooth adaptation, a unified architecture space with learned selection gates for efficient sampling, and a multi-stage training strategy for effective deployment optimization. Extensive experiments across five datasets demonstrate DANCE's effectiveness. Our method consistently outperforms state-of-the-art NAS approaches in terms of accuracy while significantly reducing search costs. Under varying computational constraints, DANCE maintains robust performance while smoothly adapting architectures to different hardware requirements. The code and appendix can be found at this https URL.
神经架构搜索(NAS)已经成为自动化设计神经网络的一种强有力的方法。然而,现有的NAS方法在实际部署中面临着关键的限制:架构缺乏跨场景的适应性,每个部署环境都需要进行代价高昂的独特搜索,并且在不同平台上保持性能一致性仍然具有挑战性。我们提出了DANCE(基于神经连续演化的动态架构),通过学习架构组件上的分布来将架构搜索重新表述为一个连续演进问题。DANCE引入了三个关键创新:一种能够实现平滑适应的连续架构分布,一种采用学习选择门进行高效采样的统一架构空间,以及一种多阶段训练策略以用于有效的部署优化。在五个数据集上进行了广泛的实验,证明了DANCE的有效性。我们的方法在精度方面持续超越最先进的NAS方法,并且大幅降低了搜索成本。在不同的计算约束条件下,DANCE能够维持稳健的性能,并平滑地调整架构以适应各种硬件需求。代码和附录可在以下链接找到:[此 URL](请将“URL”替换为实际提供的链接)。
https://arxiv.org/abs/2507.04671
Reinforcement learning (RL) enables agents to learn optimal policies through environmental interaction. However, RL suffers from reduced learning efficiency due to the curse of dimensionality in high-dimensional spaces. Quantum reinforcement learning (QRL) addresses this issue by leveraging superposition and entanglement in quantum computing, allowing efficient handling of high-dimensional problems with fewer resources. QRL combines quantum neural networks (QNNs) with RL, where the parameterized quantum circuit (PQC) acts as the core computational module. The PQC performs linear and nonlinear transformations through gate operations, similar to hidden layers in classical neural networks. Previous QRL studies, however, have used fixed PQC structures based on empirical intuition without verifying their optimality. This paper proposes a QRL-NAS algorithm that integrates quantum neural architecture search (QNAS) to optimize PQC structures within QRL. Experiments demonstrate that QRL-NAS achieves higher rewards than QRL with fixed circuits, validating its effectiveness and practical utility.
强化学习(RL)使智能体能够通过与环境的交互来学习最优策略。然而,由于高维空间中的维度灾难问题,强化学习的学习效率降低了。量子强化学习(QRL)通过利用量子计算中的叠加和纠缠来解决这一问题,从而能够在较少资源的情况下有效地处理高维问题。QRL结合了量子神经网络(QNN)与传统强化学习,其中参数化量子电路(PQC)作为核心计算模块。PQC通过门操作执行线性和非线性变换,类似于经典神经网络中的隐藏层。 然而,此前的QRL研究使用的是基于经验直觉而非经过验证的最佳结构的固定PQC架构。本文提出了一种名为QRL-NAS的算法,该算法将量子神经架构搜索(QNAS)集成到QRL中,以优化PQC结构。实验结果表明,相较于采用固定电路的QRL,QRL-NAS实现了更高的奖励值,验证了其有效性和实用价值。 通过这样的方式,QRL-NAS不仅提升了强化学习在处理高维度问题上的效率和性能,同时也展示了量子计算技术在人工智能领域中的巨大潜力。
https://arxiv.org/abs/2507.00589
Transformer-based neural speech processing has achieved state-of-the-art performance. Since speech audio signals are known to be highly compressible, here we seek to accelerate neural speech transcription by time-domain signal sparsification early in the neural encoding stage, taking advantage of the interpretability of the self-attention mechanism in transformer audio encoders. With the Whisper family of models, we perform a systematic architecture search over the joint space of sparsification stage (a certain encoder layer) and compression ratio (sparsity). We found that the best resulting solutions under 1% accuracy degradation choose to sparsify the hidden state to 40-60% sparsity at an early encoding stage, and thereby achieve up to 1.6x runtime acceleration in English speech transcription tasks on Nvidia GPUs without any fine-tuning.
基于Transformer的神经语音处理已经取得了最先进的性能。由于语音音频信号具有高度可压缩性,我们在此寻求通过在神经编码阶段早期对时域信号进行稀疏化来加速神经语音转录,并利用变压器音频编码器中自我注意机制的解释能力。使用Whisper模型系列,我们在稀疏化阶段(某个编码层)和压缩比率(稀疏度)联合空间上进行了系统的架构搜索。我们发现,在不超过1%准确率下降的情况下,最佳解决方案选择在早期编码阶段将隐藏状态稀疏化至40%-60%,从而在Nvidia GPU上实现了高达1.6倍的运行时间加速,而无需进行任何微调。
https://arxiv.org/abs/2506.15912
Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token prediction architecture, we propose Sequential Policy Gradient modeling (SPG), a novel trajectory generation paradigm for lightweight online hyperparameter optimization. In contrast to conventional policy gradient methods, SPG extends the base model with temporary modules, enabling it to generate state-action (padded) trajectories in a single forward pass. Our experiments demonstrate that models gain performance when retrained with SPG on their original datasets and also outperform standard transfer fine-tuning. We evaluate on five datasets spanning computer vision (ImageNet, COCO), natural language processing (GLUE, SQuAD), and audio (SUPERB) to assess the industrial applicability of SPG. The proposed method demonstrates consistent improvements across widely adopted models, achieving performance gains of $+0.2\sim7\%$, with significantly low computational costs. Fully reproducible code and pre-trained models: this https URL.
强化学习在神经架构搜索和超参数优化中至关重要,但传统方法由于时间成本和计算成本过高而阻碍了其广泛应用。受到DeepSeek-V3多令牌预测架构的启发,我们提出了序列策略梯度模型(SPG),这是一种轻量级在线超参数优化的新颖轨迹生成范式。与传统的策略梯度方法不同,SPG通过添加临时模块来扩展基础模型,使其能够在一次前向传递中生成状态-动作(填充)轨迹。 我们的实验表明,在原始数据集上使用SPG重新训练的模型性能会提高,并且它们也超过了标准迁移微调的效果。我们对涵盖计算机视觉(ImageNet、COCO)、自然语言处理(GLUE、SQuAD)和音频(SUPERB)五个不同领域的数据集进行了评估,以验证SPG在工业应用中的适用性。 所提出的方法在广泛采用的模型上展示了一致性的改进,实现了$+0.2\sim7\%$的性能提升,并且具有显著低的计算成本。完全可重复的代码和预训练模型在此链接中提供:[此URL](请将“this https URL”替换为实际提供的URL)。
https://arxiv.org/abs/2506.15051
In order to address the scalability challenge within Neural Architecture Search (NAS), we speed up NAS training via dynamic hard example mining within a curriculum learning framework. By utilizing an autoencoder that enforces an image similarity embedding in latent space, we construct an efficient kd-tree structure to order images by furthest neighbour dissimilarity in a low-dimensional embedding. From a given query image from our subsample dataset, we can identify the most dissimilar image within the global dataset in logarithmic time. Via curriculum learning, we then dynamically re-formulate an unbiased subsample dataset for NAS optimisation, upon which the current NAS solution architecture performs poorly. We show that our DDS-NAS framework speeds up gradient-based NAS strategies by up to 27x without loss in performance. By maximising the contribution of each image sample during training, we reduce the duration of a NAS training cycle and the number of iterations required for convergence.
为了应对神经架构搜索(NAS)中的可扩展性挑战,我们通过在课程学习框架内进行动态难例挖掘来加快NAS训练速度。利用一个自编码器强制执行图像相似性的嵌入表示,我们在潜在空间中构建了一个高效的kd树结构,以便按照低维嵌入中的最远邻域差异对图像进行排序。从我们的子样本数据集中给定的查询图像出发,我们可以在对数时间内识别全局数据集中与之最为不同的图像。通过课程学习,我们动态地重新定义一个无偏的子样本数据集供NAS优化使用,在这个数据集上当前的NAS解决方案架构表现不佳。我们展示了DDS-NAS框架能够将基于梯度的NAS策略的速度提高多达27倍,并且不会影响性能。通过最大化每个图像样本在训练过程中的贡献,我们将NAS训练周期的时间和收敛所需的迭代次数都减少了。
https://arxiv.org/abs/2506.14667
Deep learning-based pathological image analysis presents unique challenges due to the practical constraints of network design. Most existing methods apply computer vision models directly to medical tasks, neglecting the distinct characteristics of pathological images. This mismatch often leads to computational inefficiencies, particularly in edge-computing scenarios. To address this, we propose a novel Network Similarity Directed Initialization (NSDI) strategy to improve the stability of neural architecture search (NAS). Furthermore, we introduce domain adaptation into one-shot NAS to better handle variations in staining and semantic scale across pathology datasets. Experiments on the BRACS dataset demonstrate that our method outperforms existing approaches, delivering both superior classification performance and clinically relevant feature localization.
基于深度学习的病理图像分析由于网络设计的实际限制而面临独特的挑战。大多数现有的方法直接将计算机视觉模型应用于医疗任务,忽略了病理图像的独特特征。这种不匹配通常会导致计算效率低下,特别是在边缘计算场景中。为了解决这个问题,我们提出了一种新的网络相似性导向初始化(NSDI)策略来提高神经架构搜索(NAS)的稳定性。此外,我们将领域适应引入到一次性NAS中,以更好地处理病理数据集中在染色和语义尺度上的变化。我们在BRACS数据集上的实验表明,我们的方法优于现有方法,在分类性能和临床相关的特征定位方面都表现出更好的效果。
https://arxiv.org/abs/2506.14176
Kernel size selection in Convolutional Neural Networks (CNNs) is a critical but often overlooked design decision that affects receptive field, feature extraction, computational cost, and model accuracy. This paper proposes the Best Kernel Size Estimation Function (BKSEF), a mathematically grounded and empirically validated framework for optimal, layer-wise kernel size determination. BKSEF balances information gain, computational efficiency, and accuracy improvements by integrating principles from information theory, signal processing, and learning theory. Extensive experiments on CIFAR-10, CIFAR-100, ImageNet-lite, ChestX-ray14, and GTSRB datasets demonstrate that BKSEF-guided architectures achieve up to 3.1 percent accuracy improvement and 42.8 percent reduction in FLOPs compared to traditional models using uniform 3x3 kernels. Two real-world case studies further validate the approach: one for medical image classification in a cloud-based setup, and another for traffic sign recognition on edge devices. The former achieved enhanced interpretability and accuracy, while the latter reduced latency and model size significantly, with minimal accuracy trade-off. These results show that kernel size can be an active, optimizable parameter rather than a fixed heuristic. BKSEF provides practical heuristics and theoretical support for researchers and developers seeking efficient and application-aware CNN designs. It is suitable for integration into neural architecture search pipelines and real-time systems, offering a new perspective on CNN optimization.
卷积神经网络(CNN)中核大小的选择是一项关键但常常被忽视的设计决策,它影响着感受野、特征提取、计算成本和模型精度。本文提出了最佳核尺寸估算函数(BKSEF),这是一个基于数学理论并经过实证验证的框架,用于确定最优的层间核大小。BKSEF通过整合信息论、信号处理及学习理论的原则,在信息增益、计算效率以及准确性提升之间取得了平衡。 在CIFAR-10、CIFAR-100、ImageNet-lite、ChestX-ray14和GTSRB等数据集上进行的广泛实验表明,与使用统一3x3核的传统模型相比,采用BKSEF引导架构可以实现高达3.1%的准确性提升以及最多42.8%的浮点运算次数(FLOPs)减少。 两个真实世界案例进一步验证了该方法的有效性:一个是基于云环境下的医学图像分类任务,另一个是在边缘设备上进行交通标志识别。前者实现了更高的可解释性和准确率,而后者则显著降低了延迟和模型大小,并且几乎没有准确性损失。 这些结果表明核尺寸可以成为一种主动优化参数,而非固定的经验法则。BKSEF为寻求高效且应用场景敏感的CNN设计的研究人员和开发者提供了实用经验和理论支持。该方法适合集成到神经架构搜索流程中及实时系统内,提供了一种新的视角来优化CNN结构。
https://arxiv.org/abs/2506.14846
Architecture design is inherently complex. Existing approaches rely on either handcrafted rules, which demand extensive empirical expertise, or automated methods like neural architecture search, which are computationally intensive. In this paper, we introduce DMAO, an architecture optimization framework that employs a grow-and-drop strategy to automatically reallocate parameters during training. This reallocation shifts resources from less-utilized areas to those parts of the model where they are most beneficial. Notably, DMAO only introduces negligible training overhead at a given model complexity. We evaluate DMAO through experiments with CTC on LibriSpeech, TED-LIUM-v2 and Switchboard datasets. The results show that, using the same amount of training resources, our proposed DMAO consistently improves WER by up to 6% relatively across various architectures, model sizes, and datasets. Furthermore, we analyze the pattern of parameter redistribution and uncover insightful findings.
架构设计本质上是复杂的。现有的方法要么依赖于需要大量经验知识的手动编写规则,要么使用像神经架构搜索这样的自动化方法,这些方法在计算上非常耗费资源。在这篇论文中,我们介绍了DMAO(动态模型架构优化)框架,它采用了一种“生长与舍弃”策略,在训练过程中自动重新分配参数。这种重新分配将资源从利用率较低的区域转移到模型中最需要的地方。值得注意的是,DMAO在给定模型复杂度的情况下仅引入了可以忽略不计的训练开销。 我们通过在LibriSpeech、TED-LIUM-v2和Switchboard数据集上进行CTC(连接时序分类)实验来评估DMAO的效果。结果显示,在使用相同的训练资源条件下,我们的提议方法DMAO可以在各种架构、模型大小以及数据集中始终如一地将错误词率(WER)相对提高最多6%。此外,我们还分析了参数重新分配的模式,并发现了具有洞察力的结果。 该研究证明了通过智能优化模型结构和参数分布,能够在不显著增加训练成本的情况下提升性能,为自动架构搜索提供了一种新的有效策略。
https://arxiv.org/abs/2506.13180
This paper introduces Evolutionary Multi-Objective Network Architecture Search (EMNAS) for the first time to optimize neural network architectures in large-scale Reinforcement Learning (RL) for Autonomous Driving (AD). EMNAS uses genetic algorithms to automate network design, tailored to enhance rewards and reduce model size without compromising performance. Additionally, parallelization techniques are employed to accelerate the search, and teacher-student methodologies are implemented to ensure scalable optimization. This research underscores the potential of transfer learning as a robust framework for optimizing performance across iterative learning processes by effectively leveraging knowledge from earlier generations to enhance learning efficiency and stability in subsequent generations. Experimental results demonstrate that tailored EMNAS outperforms manually designed models, achieving higher rewards with fewer parameters. The findings of these strategies contribute positively to EMNAS for RL in autonomous driving, advancing the field toward better-performing networks suitable for real-world scenarios.
本文首次介绍了进化多目标网络架构搜索(EMNAS),用于优化大规模强化学习(RL)中自动驾驶(AD)的神经网络架构。EMNAS利用遗传算法自动化网络设计,旨在提升奖励并减小模型规模而不影响性能。此外,还采用了并行化技术以加速搜索过程,并实施了师生方法确保可扩展性优化。这项研究强调了迁移学习作为迭代学习过程中优化性能的强大框架潜力,通过有效利用早期代的知识来提高后续代的学习效率和稳定性。实验结果表明,定制的EMNAS优于手动设计的模型,在参数更少的情况下获得更高的奖励。这些策略的研究成果对增强EMNAS在自动驾驶强化学习中的应用贡献积极,并推动该领域向更适合现实场景的高性能网络发展。
https://arxiv.org/abs/2506.08533
Evaluation is a critical but costly procedure in neural architecture search (NAS). Performance predictors have been widely adopted to reduce evaluation costs by directly estimating architecture performance. The effectiveness of predictors is heavily influenced by the choice of loss functions. While traditional predictors employ regression loss functions to evaluate the absolute accuracy of architectures, recent approaches have explored various ranking-based loss functions, such as pairwise and listwise ranking losses, to focus on the ranking of architecture performance. Despite their success in NAS, the effectiveness and characteristics of these loss functions have not been thoroughly investigated. In this paper, we conduct the first comprehensive study on loss functions in performance predictors, categorizing them into three main types: regression, ranking, and weighted loss functions. Specifically, we assess eight loss functions using a range of NAS-relevant metrics on 13 tasks across five search spaces. Our results reveal that specific categories of loss functions can be effectively combined to enhance predictor-based NAS. Furthermore, our findings could provide practical guidance for selecting appropriate loss functions for various tasks. We hope this work provides meaningful insights to guide the development of loss functions for predictor-based methods in the NAS community.
评估是神经架构搜索(NAS)中的一个关键但成本高昂的步骤。性能预测器被广泛采用,以通过直接估算架构性能来减少评估成本。预测器的有效性很大程度上取决于损失函数的选择。传统预测器使用回归损失函数来评估架构的绝对准确性,而近期方法则探索了各种基于排名的损失函数,如成对和列表排序损失,重点关注架构性能的排名。尽管这些方法在NAS中取得了成功,但它们的效果和特性尚未得到彻底研究。 在这篇论文中,我们进行了首个关于性能预测器中的损失函数的全面研究,将它们分类为三大类:回归、排名以及加权损失函数。具体而言,我们在五个搜索空间内的13个任务上使用一系列NAS相关指标评估了八个不同的损失函数。我们的结果表明,特定类型的损失函数可以有效结合以增强基于预测器的NAS方法。此外,我们的发现可为不同任务选择适当的损失函数提供实用指导。 我们希望这项工作能为NAS社区开发用于基于预测的方法的损失函数提供有意义的见解和指导。
https://arxiv.org/abs/2506.05869
Performance predictors have emerged as a promising method to accelerate the evaluation stage of neural architecture search (NAS). These predictors estimate the performance of unseen architectures by learning from the correlation between a small set of trained architectures and their performance. However, most existing predictors ignore the inherent distribution shift between limited training samples and diverse test samples. Hence, they tend to learn spurious correlations as shortcuts to predictions, leading to poor generalization. To address this, we propose a Causality-guided Architecture Representation Learning (CARL) method aiming to separate critical (causal) and redundant (non-causal) features of architectures for generalizable architecture performance prediction. Specifically, we employ a substructure extractor to split the input architecture into critical and redundant substructures in the latent space. Then, we generate multiple interventional samples by pairing critical representations with diverse redundant representations to prioritize critical features. Extensive experiments on five NAS search spaces demonstrate the state-of-the-art accuracy and superior interpretability of CARL. For instance, CARL achieves 97.67% top-1 accuracy on CIFAR-10 using DARTS.
性能预测器已经作为加速神经架构搜索(NAS)评估阶段的一种有前景的方法出现。这些预测器通过从一组有限的训练架构及其性能之间的相关性学习,来估算未见过架构的性能。然而,大多数现有的预测方法忽略了在有限训练样本和多样测试样本之间固有的分布差异。因此,它们往往倾向于学习虚假的相关性作为预测的捷径,从而导致较差的泛化能力。 为了解决这个问题,我们提出了一种基于因果关系的架构表示学习(CARL)方法,旨在区分架构中的关键(因果)特征与冗余(非因果)特征,以实现通用化的架构性能预测。具体来说,我们使用了一个子结构提取器,将输入的架构在潜在空间中拆分为关键和冗余的子结构。然后,通过将关键表示与多样化的冗余表示配对来生成多个干预样本,从而优先考虑关键特征。 广泛的实验结果表明,在五个NAS搜索空间上,CARL方法达到了最先进的准确性,并且具有优越的可解释性。例如,使用DARTS在CIFAR-10数据集上,CARL实现了97.67%的第一名准确率。
https://arxiv.org/abs/2506.04001
Modern neural architecture search (NAS) is inherently multi-objective, balancing trade-offs such as accuracy, parameter count, and computational cost. This complexity makes NAS computationally expensive and nearly impossible to solve without efficient approximations. To address this, we propose a novel surrogate modelling approach that leverages an ensemble of Siamese network blocks to predict dominance relationships between candidate architectures. Lightweight and easy to train, the surrogate achieves 92% accuracy and replaces the crowding distance calculation in the survivor selection strategy with a heuristic rule based on model size. Integrated into a framework termed SiamNAS, this design eliminates costly evaluations during the search process. Experiments on NAS-Bench-201 demonstrate the framework's ability to identify Pareto-optimal solutions with significantly reduced computational costs. The proposed SiamNAS identified a final non-dominated set containing the best architecture in NAS-Bench-201 for CIFAR-10 and the second-best for ImageNet, in terms of test error rate, within 0.01 GPU days. This proof-of-concept study highlights the potential of the proposed Siamese network surrogate model to generalise to multi-tasking optimisation, enabling simultaneous optimisation across tasks. Additionally, it offers opportunities to extend the approach for generating Sets of Pareto Sets (SOS), providing diverse Pareto-optimal solutions for heterogeneous task settings.
现代神经架构搜索(NAS)本质上是多目标的,需要在准确性、参数数量和计算成本之间进行权衡。这种复杂性使得NAS计算成本高昂,并且几乎无法没有高效近似的情况下解决。为了应对这一挑战,我们提出了一种新颖的代理建模方法,该方法利用Siamese网络模块的集合来预测候选架构之间的支配关系。轻量级且易于训练的代理模型在准确性方面达到了92%,并且用基于模型大小的启发式规则替代了生存选择策略中的拥挤距离计算。集成到一个名为SiamNAS的框架中,这一设计消除了搜索过程中昂贵的评估。 在NAS-Bench-201上的实验展示了该框架能够在显著降低计算成本的情况下识别帕累托最优解的能力。所提出的SiamNAS在不到0.01个GPU天的时间里,在CIFAR-10上识别了NAS-Bench-201中最佳架构,并且在ImageNet上找到了第二佳的架构,根据测试错误率来衡量。 这项概念验证研究展示了提议的Siamese网络代理模型泛化到多任务优化中的潜力,使跨任务的同时优化成为可能。此外,它还为生成集合帕累托集(SOS)提供了机会,在异构的任务设定中提供多样化的帕累托最优解。
https://arxiv.org/abs/2506.02623
Diffusion models (DMs) are powerful generative models capable of producing high-fidelity images but are constrained by high computational costs due to iterative multi-step inference. While Neural Architecture Search (NAS) can optimize DMs, existing methods are hindered by retraining requirements, exponential search complexity from step-wise optimization, and slow evaluation relying on massive image generation. To address these challenges, we propose Flexiffusion, a training-free NAS framework that jointly optimizes generation schedules and model architectures without modifying pre-trained parameters. Our key insight is to decompose the generation process into flexible segments of equal length, where each segment dynamically combines three step types: full (complete computation), partial (cache-reused computation), and null (skipped computation). This segment-wise search space reduces the candidate pool exponentially compared to step-wise NAS while preserving architectural diversity. Further, we introduce relative FID (rFID), a lightweight evaluation metric for NAS that measures divergence from a teacher model's outputs instead of ground truth, slashing evaluation time by over $90\%$. In practice, Flexiffusion achieves at least $2\times$ acceleration across LDMs, Stable Diffusion, and DDPMs on ImageNet and MS-COCO, with FID degradation under $5\%$, outperforming prior NAS and caching methods. Notably, it attains $5.1\times$ speedup on Stable Diffusion with near-identical CLIP scores. Our work pioneers a resource-efficient paradigm for searching high-speed DMs without sacrificing quality.
扩散模型(DM)是一种强大的生成式模型,能够产生高质量的图像,但由于其迭代多步推理过程中的高计算成本而受到限制。虽然神经架构搜索(NAS)可以优化扩散模型,但现有方法面临着重新训练需求、基于步骤级优化的指数级搜索复杂性以及依赖大规模图像生成导致的评估速度慢等问题。为解决这些挑战,我们提出了Flexiffusion,这是一种无需训练即可进行NAS框架的方法,能够在不修改预训练参数的情况下同时优化生成调度和模型架构。 我们的关键洞察是将生成过程分解为长度相等但灵活的段落,在每个段落中动态结合三种步骤类型:完整(完整的计算)、部分(利用缓存重复的计算)和空缺(跳过的计算)。这种基于段落的搜索空间与步长NAS相比,候选池的数量呈指数级减少,同时保持架构多样性。此外,我们引入了相对FID(rFID),这是一种轻量级的NAS评估指标,通过衡量模型输出相对于教师模型输出而非真实值的差异来测量发散度,从而将评估时间减少了超过90%。 在实践中,Flexiffusion在LDMs、Stable Diffusion和DDPMs上对ImageNet和MS-COCO数据集实现了至少两倍的速度提升,并且FID退化不超过5%,优于先前的NAS和缓存方法。值得注意的是,在Stable Diffusion中,它达到了近似相等CLIP得分的五点一次加速效果。 我们的工作开创了一种资源高效的范式,用于在不牺牲质量的情况下搜索高速扩散模型。
https://arxiv.org/abs/2506.02488
To address the weight coupling problem, certain studies introduced few-shot Neural Architecture Search (NAS) methods, which partition the supernet into multiple sub-supernets. However, these methods often suffer from computational inefficiency and tend to provide suboptimal partitioning schemes. To address this problem more effectively, we analyze the weight coupling problem from a novel perspective, which primarily stems from distinct modules in succeeding layers imposing conflicting gradient directions on the preceding layer modules. Based on this perspective, we propose the Gradient Contribution (GC) method that efficiently computes the cosine similarity of gradient directions among modules by decomposing the Vector-Jacobian Product during supernet backpropagation. Subsequently, the modules with conflicting gradient directions are allocated to distinct sub-supernets while similar ones are grouped together. To assess the advantages of GC and address the limitations of existing Graph Neural Architecture Search methods, which are limited to searching a single type of Graph Neural Networks (Message Passing Neural Networks (MPNNs) or Graph Transformers (GTs)), we propose the Unified Graph Neural Architecture Search (UGAS) framework, which explores optimal combinations of MPNNs and GTs. The experimental results demonstrate that GC achieves state-of-the-art (SOTA) performance in supernet partitioning quality and time efficiency. In addition, the architectures searched by UGAS+GC outperform both the manually designed GNNs and those obtained by existing NAS methods. Finally, ablation studies further demonstrate the effectiveness of all proposed methods.
为了应对权重耦合问题,一些研究引入了少量样本的神经架构搜索(NAS)方法,这些方法将超网络分割为多个子超网络。然而,这些方法往往在计算效率上存在不足,并且倾向于提供次优的分割方案。为了更有效地解决这个问题,我们从一个新的视角分析了权重耦合问题,其主要根源在于后续层中的不同模块对前一层模块施加了相互冲突的梯度方向。基于这一视角,我们提出了梯度贡献(GC)方法,该方法通过在超网络反向传播过程中分解矢量-雅可比积来高效计算模块之间梯度方向的余弦相似性。随后,具有冲突梯度方向的模块被分配到不同的子超网络中,而类似模块则被分组在一起。 为了评估GC的优势,并解决现有图神经架构搜索方法(仅限于搜索一种类型的图形神经网络,如消息传递神经网络(MPNNs)或图变换器(GTs))的局限性,我们提出了统一图神经架构搜索(UGAS)框架,该框架探索了MPNN和GT的最佳组合。实验结果表明,GC在超网络分割质量和时间效率上达到了最先进的水平(SOTA)。此外,通过UGAS+GC搜索到的架构优于手动设计的GNNs以及现有NAS方法获得的结果。 最后,消融研究进一步证明了所有提出的方法的有效性。
https://arxiv.org/abs/2506.01231
Retinal diseases such as Diabetic Retinopathy (DR) and Macular Hole (MH) significantly impact vision and affect millions worldwide. Early detection is crucial, as DR, a complication of diabetes, damages retinal blood vessels, potentially leading to blindness, while MH disrupts central vision, affecting tasks like reading and facial recognition. This paper employed two lightweight and efficient Convolution Neural Network architectures, MobileNet and NASNetMobile, for the classification of Normal, DR, and MH retinal images. The models were trained on the RFMiD dataset, consisting of 3,200 fundus images, after undergoing preprocessing steps such as resizing, normalization, and augmentation. To address data scarcity, this study leveraged transfer learning and data augmentation techniques, enhancing model generalization and performance. The experimental results demonstrate that MobileNetV2 achieved the highest accuracy of 90.8%, outperforming NASNetMobile, which achieved 89.5% accuracy. These findings highlight the effectiveness of CNNs in retinal disease classification, providing a foundation for AI-assisted ophthalmic diagnosis and early intervention.
视网膜疾病,如糖尿病性视网膜病变(DR)和黄斑孔(MH),对视力有显著影响,并且影响了全球数百万人。早期检测至关重要,因为作为糖尿病并发症的DR会损害视网膜血管,可能导致失明;而MH则干扰中央视野,影响阅读和面部识别等任务。本文采用两种轻量高效的卷积神经网络架构——MobileNet 和 NASNetMobile,用于正常、DR 和 MH 视网膜图像分类。模型在经过预处理步骤(如调整大小、归一化和增强)后的RFMiD数据集上进行训练,该数据集包含3,200张眼底图像。为解决数据稀缺问题,本研究利用了迁移学习和技术数据增强技术,提高了模型的泛化能力和性能。实验结果表明,MobileNetV2 达到了最高的准确率90.8%,超过了NASNetMobile 的89.5% 准确率。这些发现强调了CNN在视网膜疾病分类中的有效性,为人工智能辅助眼科诊断和早期干预奠定了基础。
https://arxiv.org/abs/2506.03186
Dynamic Spectral Backpropagation (DSBP) enhances neural network training under resource constraints by projecting gradients onto principal eigenvectors, reducing complexity and promoting flat minima. Five extensions are proposed, dynamic spectral inference, spectral architecture optimization, spectral meta learning, spectral transfer regularization, and Lie algebra inspired dynamics, to address challenges in robustness, fewshot learning, and hardware efficiency. Supported by a third order stochastic differential equation (SDE) and a PAC Bayes limit, DSBP outperforms Sharpness Aware Minimization (SAM), Low Rank Adaptation (LoRA), and Model Agnostic Meta Learning (MAML) on CIFAR 10, Fashion MNIST, MedMNIST, and Tiny ImageNet, as demonstrated through extensive experiments and visualizations. Future work focuses on scalability, bias mitigation, and ethical considerations.
动态光谱反向传播(DSBP)通过将梯度投影到主特征向量上来增强资源受限条件下的神经网络训练,从而降低复杂性并促进平坦极小值的形成。提出了五种扩展方法:动态光谱推理、光谱架构优化、光谱元学习、光谱迁移正则化和李代数启发的动力学,以应对鲁棒性、少量样本学习以及硬件效率方面的挑战。DSBP基于一个三阶随机微分方程(SDE)和支持概率可放大贝叶斯限的理论,其在CIFAR 10、Fashion MNIST、MedMNIST和Tiny ImageNet数据集上的性能超过了尖锐度感知最小化(SAM)、低秩适应(LoRA)和模型无关元学习(MAML),这一结论通过广泛的实验与可视化分析得到了验证。未来的工作将重点放在可扩展性、偏差缓解以及伦理考量上。
https://arxiv.org/abs/2505.23369
Artificial intelligence and machine learning models deployed on edge devices, e.g., for quality control in Additive Manufacturing (AM), are frequently small in size. Such models usually have to deliver highly accurate results within a short time frame. Methods that are commonly employed in literature start out with larger trained models and try to reduce their memory and latency footprint by structural pruning, knowledge distillation, or quantization. It is, however, also possible to leverage hardware-aware Neural Architecture Search (NAS), an approach that seeks to systematically explore the architecture space to find optimized configurations. In this study, a hardware-aware NAS workflow is introduced that couples an edge device located in Belgium with a powerful High-Performance Computing system in Germany, to train possible architecture candidates as fast as possible while performing real-time latency measurements on the target hardware. The approach is verified on a use case in the AM domain, based on the open RAISE-LPBF dataset, achieving ~8.8 times faster inference speed while simultaneously enhancing model quality by a factor of ~1.35, compared to a human-designed baseline.
部署在边缘设备上的人工智能和机器学习模型(例如用于增材制造中的质量控制)通常尺寸较小。这些模型需要在短时间内提供高度准确的结果。文献中常用的方法是从较大的预训练模型开始,通过结构化剪枝、知识蒸馏或量化来减少其内存占用量和延迟。 然而,也可以利用硬件感知的神经架构搜索(NAS)方法,这是一种系统地探索架构空间以找到优化配置的方法。在本研究中,介绍了一种结合了比利时边缘设备与德国强大的高性能计算系统的硬件感知NAS工作流程,旨在尽可能快速训练可能的架构候选模型,并同时在目标硬件上进行实时延迟测量。这种方法在一个基于开放的RAISE-LPBF数据集的实际增材制造案例中得到了验证,在比人类设计的基础线方案快约8.8倍的同时,还将模型质量提升了大约1.35倍。 简单来说,这项研究通过结合边缘设备和高性能计算资源开发了一种新的硬件感知神经架构搜索方法。这种方法能够在保持或提高模型性能的同时显著提升推理速度,这对于诸如增材制造中的质量控制等应用场景非常有价值。
https://arxiv.org/abs/2505.19995
Optical microrobots, manipulated via optical tweezers (OT), have broad applications in biomedicine. However, reliable pose and depth perception remain fundamental challenges due to the transparent or low-contrast nature of the microrobots, as well as the noisy and dynamic conditions of the microscale environments in which they operate. An open dataset is crucial for enabling reproducible research, facilitating benchmarking, and accelerating the development of perception models tailored to microscale challenges. Standardised evaluation enables consistent comparison across algorithms, ensuring objective benchmarking and facilitating reproducible research. Here, we introduce the OpTical MicroRobot dataset (OTMR), the first publicly available dataset designed to support microrobot perception under the optical microscope. OTMR contains 232,881 images spanning 18 microrobot types and 176 distinct poses. We benchmarked the performance of eight deep learning models, including architectures derived via neural architecture search (NAS), on two key tasks: pose classification and depth regression. Results indicated that Vision Transformer (ViT) achieve the highest accuracy in pose classification, while depth regression benefits from deeper architectures. Additionally, increasing the size of the training dataset leads to substantial improvements across both tasks, highlighting OTMR's potential as a foundational resource for robust and generalisable microrobot perception in complex microscale environments.
光学微机器人通过光镊(OT)操作,在生物医学领域具有广泛的应用。然而,由于微机器人的透明性或低对比度特性以及其运行的显微环境中噪声和动态条件的存在,可靠的姿态和深度感知依然是基本挑战。一个开放的数据集对于促进可重复研究、基准测试及加速适应微尺度挑战的感知模型开发至关重要。标准化评估能够使不同算法之间的比较具有一致性和客观性,并推动可复制的研究进展。 在这里,我们推出了 OpTical MicroRobot 数据集(OTMR),这是第一个公开可用的支持光学显微镜下微机器人感知的数据集。OTMR 包含 232,881 张图像,涵盖了 18 种不同的微机器人类型和 176 种独特的姿态。我们对八个深度学习模型的性能进行了基准测试,包括通过神经架构搜索(NAS)获得的架构,并针对两个关键任务:姿态分类和深度回归。结果表明,在姿态分类任务中,视觉变换器(ViT)取得了最高的准确率;而在深度回归任务中,则从更深的架构中受益更多。此外,随着训练数据集规模的增长,两项任务中的表现都有显著提升,突显了 OTMR 作为复杂微尺度环境中稳健且具有通用性的微机器人感知基础资源的巨大潜力。
https://arxiv.org/abs/2505.18303
Medical Image Segmentation (MIS) includes diverse tasks, from bone to organ segmentation, each with its own challenges in finding the best segmentation model. The state-of-the-art AutoML-related MIS-framework nnU-Net automates many aspects of model configuration but remains constrained by fixed hyperparameters and heuristic design choices. As a full-AutoML framework for MIS, we propose Auto-nnU-Net, a novel nnU-Net variant enabling hyperparameter optimization (HPO), neural architecture search (NAS), and hierarchical NAS (HNAS). Additionally, we propose Regularized PriorBand to balance model accuracy with the computational resources required for training, addressing the resource constraints often faced in real-world medical settings that limit the feasibility of extensive training procedures. We evaluate our approach across diverse MIS datasets from the well-established Medical Segmentation Decathlon, analyzing the impact of AutoML techniques on segmentation performance, computational efficiency, and model design choices. The results demonstrate that our AutoML approach substantially improves the segmentation performance of nnU-Net on 6 out of 10 datasets and is on par on the other datasets while maintaining practical resource requirements. Our code is available at this https URL.
医学图像分割(MIS)涵盖了从骨骼到器官的各种任务,每种任务在寻找最佳分割模型时都有其独特的挑战。目前最先进的与AutoML相关的MIS框架nnU-Net自动配置了许多模型参数,但仍受限于固定的超参数和启发式设计选择。作为面向MIS的全自动化机器学习(AutoML)框架,我们提出了一种新的nnU-Net变体——Auto-nnU-Net,该框架支持超参数优化(HPO)、神经架构搜索(NAS)以及分层NAS(HNAS)。此外,我们还提出了Regularized PriorBand方法,在保证模型精度的同时考虑训练所需的计算资源,解决了实际医疗环境中由于资源限制而使广泛培训程序不可行的问题。我们在Medical Segmentation Decathlon中一系列多样化的MIS数据集上评估了我们的方法,分析了AutoML技术对分割性能、计算效率以及模型设计选择的影响。结果表明,在10个数据集中,我们的AutoML方法在6个数据集中显著提高了nnU-Net的分割性能,并且在其他四个数据集中的表现与现有方法相当,同时保持实用的资源需求。 我们提出的代码可以在以下网址获取:[请在此处插入链接]。
https://arxiv.org/abs/2505.16561
Integrating Large Language Models (LLMs) and Evolutionary Computation (EC) represents a promising avenue for advancing artificial intelligence by combining powerful natural language understanding with optimization and search capabilities. This manuscript explores the synergistic potential of LLMs and EC, reviewing their intersections, complementary strengths, and emerging applications. We identify key opportunities where EC can enhance LLM training, fine-tuning, prompt engineering, and architecture search, while LLMs can, in turn, aid in automating the design, analysis, and interpretation of ECs. The manuscript explores the synergistic integration of EC and LLMs, highlighting their bidirectional contributions to advancing artificial intelligence. It first examines how EC techniques enhance LLMs by optimizing key components such as prompt engineering, hyperparameter tuning, and architecture search, demonstrating how evolutionary methods automate and refine these processes. Secondly, the survey investigates how LLMs improve EC by automating metaheuristic design, tuning evolutionary algorithms, and generating adaptive heuristics, thereby increasing efficiency and scalability. Emerging co-evolutionary frameworks are discussed, showcasing applications across diverse fields while acknowledging challenges like computational costs, interpretability, and algorithmic convergence. The survey concludes by identifying open research questions and advocating for hybrid approaches that combine the strengths of EC and LLMs.
将大型语言模型(LLM)和进化计算(EC)结合在一起代表了一种有前景的方法,可以通过将强大的自然语言理解与优化和搜索能力相结合来推进人工智能的发展。本文探讨了LLM与EC之间的协同潜力,回顾了它们的交集、互补优势以及新兴应用。我们确定了一些关键机遇,其中EC可以增强LLM的训练、微调、提示工程及架构搜索,而LLM则可以在设计自动化、分析和解释EC方面提供帮助。论文探索了EC和LLM的相互集成方式,强调了它们在推进人工智能方面的双向贡献。首先,本文考察了EC技术如何通过优化关键组件(如提示工程、超参数调优和架构搜索)来增强LLM,展示了进化方法是如何自动化并改进这些过程的。其次,文献调查了LLM如何通过自动设计元启发式算法、调整进化算法以及生成自适应启发法来提高EC的效率和可扩展性。本文还讨论了一些新兴的共生框架,展示它们在各个领域的应用,并且注意到了诸如计算成本、解释性和算法收敛等挑战。最终,文献确定了开放性的研究问题,并倡导采用结合了EC和LLM优势的混合方法。
https://arxiv.org/abs/2505.15741