There has been a surge in optimizing edge Deep Neural Networks (DNNs) for accuracy and efficiency using traditional optimization techniques such as pruning, and more recently, employing automatic design methodologies. However, the focus of these design techniques has often overlooked critical metrics such as fairness, robustness, and generalization. As a result, when evaluating SOTA edge DNNs' performance in image classification using the FACET dataset, we found that they exhibit significant accuracy disparities (14.09%) across 10 different skin tones, alongside issues of non-robustness and poor generalizability. In response to these observations, we introduce Mixture-of-Experts-based Neural Architecture Search (MoENAS), an automatic design technique that navigates through a space of mixture of experts to discover accurate, fair, robust, and general edge DNNs. MoENAS improves the accuracy by 4.02% compared to SOTA edge DNNs and reduces the skin tone accuracy disparities from 14.09% to 5.60%, while enhancing robustness by 3.80% and minimizing overfitting to 0.21%, all while keeping model size close to state-of-the-art models average size (+0.4M). With these improvements, MoENAS establishes a new benchmark for edge DNN design, paving the way for the development of more inclusive and robust edge DNNs.
最近,针对边缘深度神经网络(DNN)的优化工作主要集中在使用传统方法如剪枝等技术来提高准确性和效率上,而近期则更多地采用了自动设计的方法。然而,在这些设计方法中,公平性、鲁棒性和泛化能力等关键指标往往被忽视。因此,当我们使用FACET数据集评估最新的边缘DNN在图像分类任务中的表现时,发现它们在10种不同肤色上存在显著的准确率差异(14.09%),并且还面临非鲁棒性和泛化能力差的问题。 为应对这些问题,我们提出了基于专家混合模型的神经架构搜索(MoENAS)方法。这是一种自动设计技术,通过探索混合专家的空间来发现准确、公平、稳健且具有广泛适用性的边缘DNN。与现有的最新边缘DNN相比,MoENAS将准确性提高了4.02%,缩小了肤色之间的准确率差距至5.60%(从14.09%),增强了鲁棒性3.80%,并将过拟合减少到0.21%,同时保持模型大小接近现有最佳模型的平均大小(+0.4M)。通过这些改进,MoENAS为边缘DNN的设计设定了新的基准,推动了更具包容性和稳健性的边缘DNN的发展。
https://arxiv.org/abs/2502.07422
Deep learning has revolutionized computer vision, but it achieved its tremendous success using deep network architectures which are mostly hand-crafted and therefore likely suboptimal. Neural Architecture Search (NAS) aims to bridge this gap by following a well-defined optimization paradigm which systematically looks for the best architecture, given objective criterion such as maximal classification accuracy. The main limitation of NAS is however its astronomical computational cost, as it typically requires training each candidate network architecture from scratch. In this paper, we aim to alleviate this limitation by proposing a novel training-free proxy for image classification accuracy based on Fisher Information. The proposed proxy has a strong theoretical background in statistics and it allows estimating expected image classification accuracy of a given deep network without training the network, thus significantly reducing computational cost of standard NAS algorithms. Our training-free proxy achieves state-of-the-art results on three public datasets and in two search spaces, both when evaluated using previously proposed metrics, as well as using a new metric that we propose which we demonstrate is more informative for practical NAS applications. The source code is publicly available at this http URL
深度学习在计算机视觉领域取得了革命性的进展,但其巨大的成功主要依赖于由人工精心设计的深层网络架构,这些架构很可能不是最优的。神经结构搜索(NAS)旨在通过遵循一个明确优化范式来弥补这一差距,该范式系统地寻找最佳架构,依据诸如最大分类准确度等具体目标标准。然而,NAS的主要限制在于其天文数字般的计算成本,因为通常需要从零开始训练每个候选网络架构。 在这篇论文中,我们提出了一个新的无需训练的代理方法(基于费雪信息),旨在缓解这一问题,并将其用于图像分类精度估计。所提出的代理具有坚实的统计理论背景,并能够在不训练网络的情况下估算给定深层网络的预期图像分类准确性,从而显著减少了标准NAS算法的计算成本。 我们的无训练代理在三个公共数据集和两个搜索空间中均取得了最先进的结果,在使用先前提出的所有度量方法评估时表现良好,同时我们也提出了一个新的度量标准,并证明这个新标准对于实际应用中的NAS任务更为有效。该论文的源代码可在以下链接获取:[提供链接](请将方括号内的内容替换为实际提供的URL)。
https://arxiv.org/abs/2502.04975
Large Language Model (LLM)-empowered multi-agent systems extend the cognitive boundaries of individual agents through disciplined collaboration and interaction, while constructing these systems often requires labor-intensive manual designs. Despite the availability of methods to automate the design of agentic workflows, they typically seek to identify a static, complex, one-size-fits-all system, which, however, fails to dynamically allocate inference resources based on the difficulty and domain of each query. To address this challenge, we shift away from the pursuit of a monolithic agentic system, instead optimizing the \textbf{agentic supernet}, a probabilistic and continuous distribution of agentic architectures. We introduce MaAS, an automated framework that samples query-dependent agentic systems from the supernet, delivering high-quality solutions and tailored resource allocation (\textit{e.g.}, LLM calls, tool calls, token cost). Comprehensive evaluation across six benchmarks demonstrates that MaAS \textbf{(I)} requires only $6\sim45\%$ of the inference costs of existing handcrafted or automated multi-agent systems, \textbf{(II)} surpasses them by $0.54\%\sim11.82\%$, and \textbf{(III)} enjoys superior cross-dataset and cross-LLM-backbone transferability.
大型语言模型(LLM)赋能的多智能体系统通过有纪律的合作和互动扩展了单个代理的认知边界,然而构建这些系统通常需要耗费大量人力的手动设计。尽管有一些方法可以自动设计代理工作流,但它们往往寻求识别一个静态、复杂的通用系统,这却无法根据每个查询的难度和领域动态分配推理资源。为了解决这一挑战,我们放弃了追求单一的整体智能体系统的做法,转而优化**智能体超网(agentic supernet)**,这是一种概率性和连续分布的代理架构集合。我们介绍了MaAS,这是一个自动化的框架,它从超网中抽样查询依赖型智能体系统,提供高质量解决方案和定制资源分配(例如LLM调用、工具调用、令牌成本)。在六个基准测试中的全面评估表明,MaAS **(I)** 只需现有手工设计或自动化多代理系统的推理成本的6%到45%,**(II)** 在性能上比它们高出0.54%至11.82%,并且 **(III)** 具有更优越的数据集间和LLM骨干之间的迁移能力。
https://arxiv.org/abs/2502.04180
Neural architecture search (NAS) has shown promise towards automating neural network design for a given task, but it is computationally demanding due to training costs associated with evaluating a large number of architectures to find the optimal one. To speed up NAS, recent works limit the search to network building blocks (modular search) instead of searching the entire architecture (global search), approximate candidates' performance evaluation in lieu of complete training, and use gradient descent rather than naturally suitable discrete optimization approaches. However, modular search does not determine network's macro architecture i.e. depth and width, demanding manual trial and error post-search, hence lacking automation. In this work, we revisit NAS and design a navigable, yet architecturally diverse, macro-micro search space. In addition, to determine relative rankings of candidates, existing methods employ consistent approximations across entire search spaces, whereas different networks may not be fairly comparable under one training protocol. Hence, we propose an architecture-aware approximation with variable training schemes for different networks. Moreover, we develop an efficient search strategy by disjoining macro-micro network design that yields competitive architectures in terms of both accuracy and size. Our proposed framework achieves a new state-of-the-art on EMNIST and KMNIST, while being highly competitive on the CIFAR-10, CIFAR-100, and FashionMNIST datasets and being 2-4x faster than the fastest global search methods. Lastly, we demonstrate the transferability of our framework to real-world computer vision problems by discovering competitive architectures for face recognition applications.
神经架构搜索(NAS)在自动化特定任务的神经网络设计方面显示出潜力,但由于评估大量架构以找到最优架构时涉及高昂的训练成本,这一过程计算密集。为了加速NAS,最近的工作将搜索限制在网络构建模块上(模块化搜索),而不是在整个架构中进行搜索(全局搜索),并且通过近似候选者的性能评价来替代完整训练,并使用梯度下降而非更适合的离散优化方法。然而,模块化搜索无法确定网络的整体结构即深度和宽度,在搜索后仍需手动试验调整,从而失去了自动化特性。 在这项工作中,我们重新审视了NAS设计了一个可导航且架构多样的宏-微搜索空间。此外,现有方法为了确定候选者的相对排名会在整个搜索空间中使用一致的近似法,但不同的网络在相同的训练协议下可能无法公平比较。因此,我们提出了基于架构感知的不同网络采用不同训练方案的近似策略。此外,通过分离宏-微网络设计,我们开发了一种高效的搜索策略,在准确性和大小方面都产生了竞争性的架构。我们的框架在EMNIST和KMNIST数据集上达到了新的SOTA(State of the Art),同时在CIFAR-10、CIFAR-100以及FashionMNIST数据集中表现出高度竞争力,并且比最快的全局搜索方法快2到4倍。 最后,我们通过为面部识别应用发现竞争性架构展示了框架向现实世界计算机视觉问题的可转移能力。
https://arxiv.org/abs/2502.03553
Neural Architecture Search (NAS) aims to automate the design of deep neural networks. However, existing NAS techniques often focus on maximising accuracy, neglecting model efficiency. This limitation restricts their use in resource-constrained environments like mobile devices and edge computing systems. Moreover, current evaluation metrics prioritise performance over efficiency, lacking a balanced approach for assessing architectures suitable for constrained scenarios. To address these challenges, this paper introduces the M-factor, a novel metric combining model accuracy and size. Four diverse NAS techniques are compared: Policy-Based Reinforcement Learning, Regularised Evolution, Tree-structured Parzen Estimator (TPE), and Multi-trial Random Search. These techniques represent different NAS paradigms, providing a comprehensive evaluation of the M-factor. The study analyses ResNet configurations on the CIFAR-10 dataset, with a search space of 19,683 configurations. Experiments reveal that Policy-Based Reinforcement Learning and Regularised Evolution achieved M-factor values of 0.84 and 0.82, respectively, while Multi-trial Random Search attained 0.75, and TPE reached 0.67. Policy-Based Reinforcement Learning exhibited performance changes after 39 trials, while Regularised Evolution optimised within 20 trials. The research investigates the optimisation dynamics and trade-offs between accuracy and model size for each strategy. Findings indicate that, in some cases, random search performed comparably to more complex algorithms when assessed using the M-factor. These results highlight how the M-factor addresses the limitations of existing metrics by guiding NAS towards balanced architectures, offering valuable insights for selecting strategies in scenarios requiring both performance and efficiency.
神经架构搜索(NAS)旨在自动化深度神经网络的设计。然而,现有的NAS技术通常侧重于最大化准确性,而忽略了模型的效率。这种限制在资源受限的环境中,如移动设备和边缘计算系统中,极大地限制了其应用范围。此外,当前的评估指标倾向于优先考虑性能而不是效率,缺乏一种平衡的方法来评估适用于受限场景的架构。为了解决这些挑战,本文引入了一种新的度量标准M因子,它结合了模型准确性和大小。该研究比较了四种不同的NAS技术:基于策略的强化学习、正则化演化、树状帕累托估计器(TPE)和多次随机搜索。这四种方法代表了不同类型的NAS范式,从而提供了对M因子全面评估的方法。 研究分析了在CIFAR-10数据集上的ResNet配置,在此情况下有19,683种可能的架构设置。实验结果表明:基于策略的强化学习和正则化演化分别达到了M因子值为0.84和0.82,而多次随机搜索达到0.75,TPE则达到了0.67。基于策略的强化学习在经历了39次试验后表现出性能变化,而正则化演化仅需20次试验就能完成优化。该研究探讨了每种策略在准确性与模型大小之间的优化动态和权衡。 研究结果表明,在某些情况下,使用M因子评估时,随机搜索的表现可以媲美更复杂的算法。这些发现突显了M因子如何通过指导NAS朝着平衡架构的方向发展来弥补现有指标的不足,并为需要兼顾性能和效率场景的选择策略提供了宝贵的见解。
https://arxiv.org/abs/2501.17361
Pneumonia is a leading cause of illness and death in children, underscoring the need for early and accurate detection. In this study, we propose a novel lightweight ensemble model for detecting pneumonia in children using chest X-ray images. This ensemble model integrates two pre-trained convolutional neural networks (CNNs), MobileNetV2 and NASNetMobile, selected for their balance of computational efficiency and accuracy. These models were fine-tuned on a pediatric chest X-ray dataset and combined to enhance classification performance. Our proposed ensemble model achieved a classification accuracy of 98.63%, significantly outperforming individual models such as MobileNetV2 (97.10%) and NASNetMobile(96.25%) in terms of accuracy, precision, recall, and F1 score. Moreover, the ensemble model outperformed state-of-the-art architectures, including ResNet50, InceptionV3, and DenseNet201, while maintaining computational efficiency. The proposed lightweight ensemble model presents a highly effective and resource-efficient solution for pneumonia detection, making it particularly suitable for deployment in resource-constrained settings.
肺炎是导致儿童生病和死亡的主要原因之一,因此早期且准确地检测肺炎至关重要。在这项研究中,我们提出了一种新颖的轻量级集成模型,用于通过胸部X光片来检测儿童肺炎。该集成模型结合了两个预训练的卷积神经网络(CNN),即MobileNetV2和NASNetMobile,这两个模型因其在计算效率与准确度之间的平衡而被选中。这些模型在一个儿科胸部X光数据集上进行了微调,并组合起来以增强分类性能。我们提出的集成模型达到了98.63%的分类精度,在准确性、精确性、召回率和F1分数方面,均显著优于单独的MobileNetV2(97.10%)和NASNetMobile(96.25%)。此外,该集成模型在计算效率不变的情况下,超越了最先进的架构如ResNet50、InceptionV3和DenseNet201。所提出的轻量级集成模型为肺炎检测提供了一种高度有效且资源节约的解决方案,特别适合于资源有限的环境中部署。
https://arxiv.org/abs/2501.16249
Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations. In this survey, we present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints. First, we examine model compression techniques-pruning, quantization, tensor decomposition, and knowledge distillation-that streamline large models into smaller, faster, and more efficient variants. Next, we explore Neural Architecture Search (NAS), a class of automated methods that discover architectures inherently optimized for particular tasks and hardware budgets. We then discuss compiler and deployment frameworks, such as TVM, TensorRT, and OpenVINO, which provide hardware-tailored optimizations at inference time. By integrating these three pillars into unified pipelines, practitioners can achieve multi-objective goals, including latency reduction, memory savings, and energy efficiency-all while maintaining competitive accuracy. We also highlight emerging frontiers in hierarchical NAS, neurosymbolic approaches, and advanced distillation tailored to large language models, underscoring open challenges like pre-training pruning for massive networks. Our survey offers practical insights, identifies current research gaps, and outlines promising directions for building scalable, platform-independent frameworks to accelerate deep learning models at the edge.
资源受限的边缘部署需要一种平衡高性能与严格计算、内存和能源限制的人工智能解决方案。在这份综述中,我们全面介绍了在这些约束下加速深度学习模型的主要策略。首先,我们探讨了模型压缩技术——包括剪枝、量化、张量分解和知识蒸馏——将大型模型转化为更小、更快且更高效的变体。接下来,我们研究神经架构搜索(NAS),这是一种自动发现针对特定任务和硬件预算优化的架构的方法类别。然后,我们讨论编译器和部署框架,例如TVM、TensorRT和OpenVINO,这些工具在推理时提供专门针对硬件的优化。通过将这三个支柱整合为统一的工作流程,从业者可以实现多目标成果,包括延迟减少、内存节省、能源效率,并且同时保持竞争力的准确度。我们还强调了分层NAS、神经符号方法以及为大型语言模型定制的高级蒸馏等新兴前沿领域,特别指出预训练剪枝对于大规模网络面临的开放挑战等问题。本综述提供了实用见解,识别当前研究空白,并概述有前景的方向,以构建可扩展且平台独立的框架来加速边缘部署中的深度学习模型。
https://arxiv.org/abs/2501.15014
Multimodal fake news detection has become one of the most crucial issues on social media platforms. Although existing methods have achieved advanced performance, two main challenges persist: (1) Under-performed multimodal news information fusion due to model architecture solidification, and (2) weak generalization ability on partial-modality contained fake news. To meet these challenges, we propose a novel and flexible triple path enhanced neural architecture search model MUSE. MUSE includes two dynamic paths for detecting partial-modality contained fake news and a static path for exploiting potential multimodal correlations. Experimental results show that MUSE achieves stable performance improvement over the baselines.
多模态假新闻检测已成为社交媒体平台上最为关键的问题之一。尽管现有的方法已经取得了先进的性能,但仍然存在两个主要挑战:(1)由于模型架构固化导致的多模态新闻信息融合表现不佳;(2)对包含部分模态的假新闻泛化能力较弱。为了应对这些挑战,我们提出了一种新颖且灵活的三路径增强神经架构搜索模型MUSE。MUSE包括两个动态路径用于检测包含部分模态的假新闻以及一个静态路径用于挖掘潜在的多模态关联性。实验结果表明,与基准方法相比,MUSE实现了稳定的性能提升。
https://arxiv.org/abs/2501.14455
The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at this https URL.
大型语言模型(LLMs)的迅速扩张给微调和部署所需的计算资源带来了重大挑战。近期关于低秩适配器的进步展示了它们在这些模型参数高效微调(PEFT)中的有效性。这篇回顾性论文全面讨论了将低秩表示与神经架构搜索(NAS)技术,特别是权重共享超级网络相结合的创新方法。通过整合这些方法,我们开发出了能够压缩和精细化大规模预训练模型的强大解决方案。我们的分析强调了这些组合策略在使LLMs更易于使用方面的潜力,使得它们能够在资源受限环境中更容易部署。由此产生的模型显示出较小的记忆占用量和更快的推理时间,为LLMs的实际应用和可扩展性铺平了道路。模型和代码可在[此处](https://这个URL)获取。
https://arxiv.org/abs/2501.16372
Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS. One-shot NAS works by generating a singular weight-sharing supernetwork that acts as a search space (container) of subnetworks. Despite its achievements, designing the one-shot search space remains a major challenge. In this work we propose a search space design strategy for Vision Transformer (ViT)-based architectures. In particular, we convert the Segment Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our approach involves automating the search space design via layer-wise structured pruning and parameter prioritization. While the structured pruning applies probabilistic removal of certain transformer layers, parameter prioritization performs weight reordering and slicing of MLP-blocks in the remaining layers. We train supernetworks on several datasets using the sandwich rule. For deployment, we enhance subnetwork discovery by utilizing a program autotuner to identify efficient subnetworks within the search space. The resulting subnetworks are 30-70% smaller in size compared to the original pre-trained SAM ViT-B, yet outperform the pretrained model. Our work introduces a new and effective method for ViT NAS search-space design.
神经架构搜索(NAS)是一种自动化高效神经网络设计的强大方法。与传统的NAS方法相比,近期提出的one-shot NAS方法在执行NAS任务时更为有效。One-shot NAS通过生成一个单一的权重共享超网络来工作,这个超网络作为子网络集合的搜索空间(容器)。尽管取得了显著成就,但设计one-shot搜索空间依然是一个重大挑战。在此研究中,我们为基于Vision Transformer (ViT) 的架构提出了一个新的搜索空间设计方案。具体来说,我们将Segment Anything Model (SAM) 转换成了名为SuperSAM的权重共享超网络。 我们的方法通过逐层结构化剪枝和参数优先级设置实现了自动化的搜索空间设计。在结构化剪枝中,我们采用概率移除某些Transformer层的方式;而在参数优先级设置阶段,则执行剩余层中的MLP块(多层感知机块)的权重重排序和切片操作。 我们在多个数据集上使用三明治法则训练超网络。对于部署阶段,通过利用程序自动调优器来识别搜索空间内的高效子网络以增强子网络发现过程。最终得到的子网络比原始预训练的SAM ViT-B小30-70%,但性能优于预训练模型。 我们的工作引入了一种新的且有效的ViT NAS搜索空间设计方案,这对于未来基于Transformer架构的研究具有重要意义。
https://arxiv.org/abs/2501.08504
In the realm of neural architecture design, achieving high performance is largely reliant on the manual expertise of researchers. Despite the emergence of Neural Architecture Search (NAS) as a promising technique for automating this process, current NAS methods still require human input to expand the search space and cannot generate new architectures. This paper explores the potential of Transformers in comprehending neural architectures and their performance, with the objective of establishing the foundation for utilizing Transformers to generate novel networks. We propose the Token-based Architecture Transformer (TART), which predicts neural network performance without the need to train candidate networks. TART attains state-of-the-art performance on the DeepNets-1M dataset for performance prediction tasks without edge information, indicating the potential of Transformers to aid in discovering novel and high-performing neural architectures.
在神经网络架构设计领域,实现高性能很大程度上依赖于研究人员的个人专业知识。尽管神经架构搜索(NAS)作为一种有望自动化的技术已经出现,目前的NAS方法仍然需要人类干预来扩展搜索空间,并且无法生成全新的架构。本文探讨了Transformer模型在理解神经架构及其性能方面的潜力,旨在为利用Transformer生成新颖网络奠定基础。我们提出了基于令牌的架构变换器(TART),该模型能够在无需训练候选网络的情况下预测神经网络的性能。TART在DeepNets-1M数据集上的性能预测任务中达到了最先进的水平,并且没有使用边信息,这表明了Transformers在帮助发现新颖和高性能的神经架构方面的潜力。
https://arxiv.org/abs/2501.02007
Event-based cameras are sensors that simulate the human eye, offering advantages such as high-speed robustness and low power consumption. Established Deep Learning techniques have shown effectiveness in processing event data. Chimera is a Block-Based Neural Architecture Search (NAS) framework specifically designed for Event-Based Object Detection, aiming to create a systematic approach for adapting RGB-domain processing methods to the event domain. The Chimera design space is constructed from various macroblocks, including Attention blocks, Convolutions, State Space Models, and MLP-mixer-based architectures, which provide a valuable trade-off between local and global processing capabilities, as well as varying levels of complexity. The results on the PErson Detection in Robotics (PEDRo) dataset demonstrated performance levels comparable to leading state-of-the-art models, alongside an average parameter reduction of 1.6 times.
基于事件的相机是模仿人眼工作的传感器,具有高速鲁棒性和低功耗等优点。已建立的深度学习技术在处理事件数据方面表现出色。Chimera 是一个块基神经架构搜索(NAS)框架,专门针对基于事件的目标检测设计,旨在为将RGB域的处理方法适应到事件域提供系统的方法。Chimera 的设计空间由各种宏块构建而成,包括注意模块、卷积层、状态空间模型和MLP-mixer 基础架构等,这些模块在局部与全局处理能力之间提供了有价值的权衡,并且具有不同的复杂度水平。在机器人中进行人物检测的PEDRo 数据集上的结果表明,Chimera 的性能达到了领先的状态-of-the-art 模型的水平,并且平均参数减少了1.6倍。
https://arxiv.org/abs/2412.19646
Designing effective neural architectures poses a significant challenge in deep learning. While Neural Architecture Search (NAS) automates the search for optimal architectures, existing methods are often constrained by predetermined search spaces and may miss critical neural architectures. In this paper, we introduce NADER (Neural Architecture Design via multi-agEnt collaboRation), a novel framework that formulates neural architecture design (NAD) as a LLM-based multi-agent collaboration problem. NADER employs a team of specialized agents to enhance a base architecture through iterative modification. Current LLM-based NAD methods typically operate independently, lacking the ability to learn from past experiences, which results in repeated mistakes and inefficient exploration. To address this issue, we propose the Reflector, which effectively learns from immediate feedback and long-term experiences. Additionally, unlike previous LLM-based methods that use code to represent neural architectures, we utilize a graph-based representation. This approach allows agents to focus on design aspects without being distracted by coding. We demonstrate the effectiveness of NADER in discovering high-performing architectures beyond predetermined search spaces through extensive experiments on benchmark tasks, showcasing its advantages over state-of-the-art methods. The codes will be released soon.
设计有效的神经网络架构在深度学习领域是一个重大挑战。虽然神经架构搜索(NAS)可以自动化寻找最优的架构,但现有的方法往往受限于预设的搜索空间,并可能错过关键的神经架构。在这篇论文中,我们引入了NADER(通过多智能体协作进行神经架构设计),这是一个将神经网络架构设计(NAD)视为基于大语言模型(LLM)的多代理协作问题的新框架。NADER采用了一组专门化的代理团队,通过迭代修改来增强基础架构。当前基于LLM的NAD方法通常独立运作,缺乏从过往经验中学习的能力,导致重复犯错和探索效率低下。为了解决这个问题,我们提出了反射器(Reflector),它能够有效地从即时反馈和长期经历中学习。 此外,不同于之前使用代码来表示神经架构的基于LLM的方法,我们采用了图基表示法。这种方法使代理可以专注于设计方面而不被编程任务所干扰。通过在基准任务上进行广泛的实验,我们展示了NADER发现超出预设搜索空间的高性能架构的有效性,并证明了其相较于当前最先进方法的优势。代码即将发布。
https://arxiv.org/abs/2412.19206
In this paper, we reveal the intrinsic duality between graph neural networks (GNNs) and evolutionary algorithms (EAs), bridging two traditionally distinct fields. Building on this insight, we propose Graph Neural Evolution (GNE), a novel evolutionary algorithm that models individuals as nodes in a graph and leverages designed frequency-domain filters to balance global exploration and local exploitation. Through the use of these filters, GNE aggregates high-frequency (diversity-enhancing) and low-frequency (stability-promoting) information, transforming EAs into interpretable and tunable mechanisms in the frequency domain. Extensive experiments on benchmark functions demonstrate that GNE consistently outperforms state-of-the-art algorithms such as GA, DE, CMA-ES, SDAES, and RL-SHADE, excelling in complex landscapes, optimal solution shifts, and noisy environments. Its robustness, adaptability, and superior convergence highlight its practical and theoretical value. Beyond optimization, GNE establishes a conceptual and mathematical foundation linking EAs and GNNs, offering new perspectives for both fields. Its framework encourages the development of task-adaptive filters and hybrid approaches for EAs, while its insights can inspire advances in GNNs, such as improved global information propagation and mitigation of oversmoothing. GNE's versatility extends to solving challenges in machine learning, including hyperparameter tuning and neural architecture search, as well as real-world applications in engineering and operations research. By uniting the dynamics of EAs with the structural insights of GNNs, this work provides a foundation for interdisciplinary innovation, paving the way for scalable and interpretable solutions to complex optimization problems.
在这篇论文中,我们揭示了图神经网络(GNN)与进化算法(EAs)之间的内在二元性,并将这两个传统上截然不同的领域联系起来。基于这一见解,我们提出了图神经演化(GNE),这是一种新型的进化算法,它将个体建模为图中的节点,并利用设计好的频域滤波器来平衡全局探索和局部开发。通过使用这些滤波器,GNE能够聚合高频(增强多样性)和低频(促进稳定性)信息,从而在频域中将EAs转化为可解释且可调的机制。对基准函数进行的广泛实验表明,GNE持续超越诸如GA、DE、CMA-ES、SDAES和RL-SHADE等最先进的算法,在复杂环境、最优解变动以及噪声环境中表现出色。其稳健性、适应性和卓越的收敛性能凸显了其实用价值与理论意义。除了优化之外,GNE还建立了一个连接EAs和GNNs的概念和数学基础,为这两个领域提供了新的视角。它的框架促进了针对特定任务自适应滤波器及混合方法的发展,并且它的见解可以激发GNNs的进步,比如改进全局信息传播并缓解过平滑问题。GNE的多功能性使其能够应对机器学习中的挑战,包括超参数调整和神经架构搜索,以及在工程和运筹学等实际应用场景中发挥作用。通过将EAs的动力与GNNs的结构洞见相结合,这项工作为跨学科创新奠定了基础,并开辟了通往解决复杂优化问题可扩展且易于解释方案的道路。
https://arxiv.org/abs/2412.17629
Existing efforts to boost multimodal fusion of 3D anomaly detection (3D-AD) primarily concentrate on devising more effective multimodal fusion strategies. However, little attention was devoted to analyzing the role of multimodal fusion architecture (topology) design in contributing to 3D-AD. In this paper, we aim to bridge this gap and present a systematic study on the impact of multimodal fusion architecture design on 3D-AD. This work considers the multimodal fusion architecture design at the intra-module fusion level, i.e., independent modality-specific modules, involving early, middle or late multimodal features with specific fusion operations, and also at the inter-module fusion level, i.e., the strategies to fuse those modules. In both cases, we first derive insights through theoretically and experimentally exploring how architectural designs influence 3D-AD. Then, we extend SOTA neural architecture search (NAS) paradigm and propose 3D-ADNAS to simultaneously search across multimodal fusion strategies and modality-specific modules for the first this http URL experiments show that 3D-ADNAS obtains consistent improvements in 3D-AD across various model capacities in terms of accuracy, frame rate, and memory usage, and it exhibits great potential in dealing with few-shot 3D-AD tasks.
现有的提升三维异常检测(3D-AD)多模态融合的努力主要集中在设计更有效的多模态融合策略上。然而,对分析多模态融合架构(拓扑结构)设计在促进3D-AD中的作用关注较少。本文旨在弥补这一差距,并系统性地研究多模态融合架构设计对3D-AD的影响。本工作从两个层面考虑多模态融合架构的设计:一是模块内融合,即独立的模态特定模块,涉及早期、中期或晚期的具体融合操作下的多模态特征;二是模块间融合,即如何将这些模块融合起来的策略。在这两种情况下,我们首先通过理论和实验探索来获得关于架构设计对3D-AD影响的见解。然后,我们将最先进的神经网络结构搜索(NAS)范式扩展,并提出3D-ADNAS以同时搜索多模态融合策略和模态特定模块。实验表明,3D-ADNAS在各种模型容量下,在准确率、帧率和内存使用方面对3D-AD持续改进,并且在处理少量样本的3D-AD任务中展现出巨大的潜力。
https://arxiv.org/abs/2412.17297
Learning curve extrapolation predicts neural network performance from early training epochs and has been applied to accelerate AutoML, facilitating hyperparameter tuning and neural architecture search. However, existing methods typically model the evolution of learning curves in isolation, neglecting the impact of neural network (NN) architectures, which influence the loss landscape and learning trajectories. In this work, we explore whether incorporating neural network architecture improves learning curve modeling and how to effectively integrate this architectural information. Motivated by the dynamical system view of optimization, we propose a novel architecture-aware neural differential equation model to forecast learning curves continuously. We empirically demonstrate its ability to capture the general trend of fluctuating learning curves while quantifying uncertainty through variational parameters. Our model outperforms current state-of-the-art learning curve extrapolation methods and pure time-series modeling approaches for both MLP and CNN-based learning curves. Additionally, we explore the applicability of our method in Neural Architecture Search scenarios, such as training configuration ranking.
学习曲线外推能够从神经网络早期训练周期中预测性能,并已被应用于加速自动化机器学习(AutoML),从而促进超参数调整和神经架构搜索。然而,现有的方法通常孤立地建模学习曲线的演变过程,忽视了神经网络(NN)架构的影响,后者会影响损失景观和学习轨迹。在这项工作中,我们探讨将神经网络架构纳入学习曲线建模是否能改善其效果以及如何有效地整合这种架构信息。受优化动力系统视角的启发,我们提出了一种新的架构感知型神经微分方程模型来连续预测学习曲线。实验证明了该模型能够捕捉波动的学习曲线的一般趋势,并通过变分参数量化不确定性。我们的模型在多层感知器(MLP)和卷积神经网络(CNN)基础的学习曲线外推方面,超过了当前最先进的方法和纯时间序列建模方法。此外,我们还探讨了此方法在神经架构搜索场景中的适用性,例如训练配置排名中应用的可行性。
https://arxiv.org/abs/2412.15554
Neural architecture search (NAS) enables finding the best-performing architecture from a search space automatically. Most NAS methods exploit an over-parameterized network (i.e., a supernet) containing all possible architectures (i.e., subnets) in the search space. However, the subnets that share the same set of parameters are likely to have different characteristics, interfering with each other during training. To address this, few-shot NAS methods have been proposed that divide the space into a few subspaces and employ a separate supernet for each subspace to limit the extent of weight sharing. They achieve state-of-the-art performance, but the computational cost increases accordingly. We introduce in this paper a novel few-shot NAS method that exploits the number of nonlinear functions to split the search space. To be specific, our method divides the space such that each subspace consists of subnets with the same number of nonlinear functions. Our splitting criterion is efficient, since it does not require comparing gradients of a supernet to split the space. In addition, we have found that dividing the space allows us to reduce the channel dimensions required for each supernet, which enables training multiple supernets in an efficient manner. We also introduce a supernet-balanced sampling (SBS) technique, sampling several subnets at each training step, to train different supernets evenly within a limited number of training steps. Extensive experiments on standard NAS benchmarks demonstrate the effectiveness of our approach. Our code is available at this https URL.
神经架构搜索(NAS)能够自动从搜索空间中找到表现最佳的架构。大多数NAS方法利用一个超参数化网络(即超网),该网络包含搜索空间中的所有可能架构(即子网)。然而,共享同一组参数的子网可能会有不同的特性,在训练过程中相互干扰。为了解决这个问题,有人提出了少量样本NAS方法,这些方法将空间划分为几个子空间,并为每个子空间使用单独的超网以限制权重共享的程度。它们达到了最先进的性能,但相应的计算成本也增加了。 本文介绍了一种新颖的少量样本NAS方法,利用非线性函数的数量来划分搜索空间。具体来说,我们的方法将空间划分为多个子空间,使得每个子空间包含具有相同数量非线性函数的子网。我们的划分标准是高效的,因为它不需要通过比较超网的梯度来分割空间。此外,我们发现,对空间进行划分使我们可以减少为每个超网所需的通道维度,从而能够在高效的方式下训练多个超网。我们还引入了一种平衡采样(SBS)技术,在每次训练步骤中采样几个子网,以在有限数量的训练步骤内均匀地训练不同的超网。标准NAS基准上的广泛实验展示了我们的方法的有效性。我们的代码可以在以下链接找到:[此 https URL]。
https://arxiv.org/abs/2412.14678
Understanding the deep meanings of the Qur'an and bridging the language gap between modern standard Arabic and classical Arabic is essential to improve the question-and-answer system for the Holy Qur'an. The Qur'an QA 2023 shared task dataset had a limited number of questions with weak model retrieval. To address this challenge, this work updated the original dataset and improved the model accuracy. The original dataset, which contains 251 questions, was reviewed and expanded to 629 questions with question diversification and reformulation, leading to a comprehensive set of 1895 categorized into single-answer, multi-answer, and zero-answer types. Extensive experiments fine-tuned transformer models, including AraBERT, RoBERTa, CAMeLBERT, AraELECTRA, and BERT. The best model, AraBERT-base, achieved a MAP@10 of 0.36 and MRR of 0.59, representing improvements of 63% and 59%, respectively, compared to the baseline scores (MAP@10: 0.22, MRR: 0.37). Additionally, the dataset expansion led to improvements in handling "no answer" cases, with the proposed approach achieving a 75% success rate for such instances, compared to the baseline's 25%. These results demonstrate the effect of dataset improvement and model architecture optimization in increasing the performance of QA systems for the Holy Qur'an, with higher accuracy, recall, and precision.
理解《古兰经》的深刻含义以及弥合现代标准阿拉伯语和古典阿拉伯语之间的语言差距,对于改进《圣训》问答系统至关重要。2023年共享任务数据集中,《古兰经》QA的问题数量有限且模型检索能力较弱。为应对这一挑战,本研究更新了原始数据集并提高了模型的准确性。该原始数据集包含251个问题,经过审查和扩展后增加到629个问题,并通过多样化和重新表述问题使其扩大至1895个,这些问题被分类为单答案、多答案和零答案类型。广泛的实验对变换器模型进行了微调,包括AraBERT、RoBERTa、CAMeLBERT、AraELECTRA和BERT。表现最佳的模型是AraBERT-base,在MAP@10上达到了0.36,在MRR上达到了0.59,分别比基线分数(MAP@10: 0.22, MRR: 0.37)提高了63%和59%。此外,数据集的扩展还改进了处理“无答案”情况的能力,所提出的方法对这类实例的成功率达到了75%,而基线的成功率为25%。这些结果表明,通过优化数据集和模型架构,可以提高《圣训》问答系统的性能,实现更高的准确性、召回率和精确度。
https://arxiv.org/abs/2412.11431
One-shot methods have significantly advanced the field of neural architecture search (NAS) by adopting weight-sharing strategy to reduce search costs. However, the accuracy of performance estimation can be compromised by co-adaptation. Few-shot methods divide the entire supernet into individual sub-supernets by splitting edge by edge to alleviate this issue, yet neglect relationships among edges and result in performance degradation on huge search space. In this paper, we introduce HEP-NAS, a hierarchy-wise partition algorithm designed to further enhance accuracy. To begin with, HEP-NAS treats edges sharing the same end node as a hierarchy, permuting and splitting edges within the same hierarchy to directly search for the optimal operation combination for each intermediate node. This approach aligns more closely with the ultimate goal of NAS. Furthermore, HEP-NAS selects the most promising sub-supernet after each segmentation, progressively narrowing the search space in which the optimal architecture may exist. To improve performance evaluation of sub-supernets, HEP-NAS employs search space mutual distillation, stabilizing the training process and accelerating the convergence of each individual sub-supernet. Within a given budget, HEP-NAS enables the splitting of all edges and gradually searches for architectures with higher accuracy. Experimental results across various datasets and search spaces demonstrate the superiority of HEP-NAS compared to state-of-the-art methods.
一发即中的方法通过采用权重共享策略来减少搜索成本,从而显著推进了神经架构搜索(NAS)领域的发展。然而,性能估计的准确性可能会因共同适应而受到损害。少发方法通过逐边分割整个超网络以缓解这一问题,并将超网络划分为单独的小超网络,但却忽略了边之间的关系,导致在巨大搜索空间中的表现下降。本文中,我们介绍了HEP-NAS,这是一种按层次划分的算法,旨在进一步提高准确性。首先,HEP-NAS将共享同一终点节点的边视为一个层级,在同一个层级内对边进行排列和分割,直接为每个中间节点搜索最优的操作组合。这种方法更符合NAS的最终目标。此外,HEP-NAS在每次分段后选择最有希望的小超网络,逐步缩小可能存在的最佳架构的搜索空间。为了改进小超网络的表现评估,HEP-NAS采用了搜索空间互蒸馏技术,稳定了训练过程并加速了每个独立小超网络的收敛速度。在给定预算内,HEP-NAS能够分割所有边,并逐渐寻找具有更高准确性的架构。实验结果表明,在不同的数据集和搜索空间中,HEP-NAS相比最先进的方法具有优越性。
https://arxiv.org/abs/2412.10723
Deep learning-based image generation has undergone a paradigm shift since 2021, marked by fundamental architectural breakthroughs and computational innovations. Through reviewing architectural innovations and empirical results, this paper analyzes the transition from traditional generative methods to advanced architectures, with focus on compute-efficient diffusion models and vision transformer architectures. We examine how recent developments in Stable Diffusion, DALL-E, and consistency models have redefined the capabilities and performance boundaries of image synthesis, while addressing persistent challenges in efficiency and quality. Our analysis focuses on the evolution of latent space representations, cross-attention mechanisms, and parameter-efficient training methodologies that enable accelerated inference under resource constraints. While more efficient training methods enable faster inference, advanced control mechanisms like ControlNet and regional attention systems have simultaneously improved generation precision and content customization. We investigate how enhanced multi-modal understanding and zero-shot generation capabilities are reshaping practical applications across industries. Our analysis demonstrates that despite remarkable advances in generation quality and computational efficiency, critical challenges remain in developing resource-conscious architectures and interpretable generation systems for industrial applications. The paper concludes by mapping promising research directions, including neural architecture optimization and explainable generation frameworks.
自2021年以来,基于深度学习的图像生成经历了范式的转变,这一转变以架构上的突破和计算创新为标志。通过回顾架构创新和实证结果,本文分析了从传统生成方法到先进架构的过渡,特别关注于计算高效的扩散模型和视觉变压器架构。我们探讨了Stable Diffusion、DALL-E和一致性模型等近期发展如何重新定义图像合成的能力和性能边界,同时解决效率和质量方面的持久挑战。我们的分析集中在潜在空间表示、交叉注意力机制以及参数高效训练方法的演进上,这些方法在资源受限的情况下实现了加速推理。虽然更高效的训练方法使推理速度加快,但诸如ControlNet和区域注意力系统等高级控制机制也同时提高了生成精度和内容定制化程度。我们还研究了增强的多模态理解和零样本生成能力如何重塑各行业的实际应用。分析表明,尽管在生成质量和计算效率方面取得了显著进展,但仍存在开发资源意识架构和可解释生成系统的重大挑战以满足工业应用需求。文章最后提出了有前景的研究方向,包括神经网络结构优化和可解释生成框架。
https://arxiv.org/abs/2412.09656