Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS. One-shot NAS works by generating a singular weight-sharing supernetwork that acts as a search space (container) of subnetworks. Despite its achievements, designing the one-shot search space remains a major challenge. In this work we propose a search space design strategy for Vision Transformer (ViT)-based architectures. In particular, we convert the Segment Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our approach involves automating the search space design via layer-wise structured pruning and parameter prioritization. While the structured pruning applies probabilistic removal of certain transformer layers, parameter prioritization performs weight reordering and slicing of MLP-blocks in the remaining layers. We train supernetworks on several datasets using the sandwich rule. For deployment, we enhance subnetwork discovery by utilizing a program autotuner to identify efficient subnetworks within the search space. The resulting subnetworks are 30-70% smaller in size compared to the original pre-trained SAM ViT-B, yet outperform the pretrained model. Our work introduces a new and effective method for ViT NAS search-space design.
神经架构搜索(NAS)是一种自动化高效神经网络设计的强大方法。与传统的NAS方法相比,近期提出的one-shot NAS方法在执行NAS任务时更为有效。One-shot NAS通过生成一个单一的权重共享超网络来工作,这个超网络作为子网络集合的搜索空间(容器)。尽管取得了显著成就,但设计one-shot搜索空间依然是一个重大挑战。在此研究中,我们为基于Vision Transformer (ViT) 的架构提出了一个新的搜索空间设计方案。具体来说,我们将Segment Anything Model (SAM) 转换成了名为SuperSAM的权重共享超网络。 我们的方法通过逐层结构化剪枝和参数优先级设置实现了自动化的搜索空间设计。在结构化剪枝中,我们采用概率移除某些Transformer层的方式;而在参数优先级设置阶段,则执行剩余层中的MLP块(多层感知机块)的权重重排序和切片操作。 我们在多个数据集上使用三明治法则训练超网络。对于部署阶段,通过利用程序自动调优器来识别搜索空间内的高效子网络以增强子网络发现过程。最终得到的子网络比原始预训练的SAM ViT-B小30-70%,但性能优于预训练模型。 我们的工作引入了一种新的且有效的ViT NAS搜索空间设计方案,这对于未来基于Transformer架构的研究具有重要意义。
https://arxiv.org/abs/2501.08504
In the realm of neural architecture design, achieving high performance is largely reliant on the manual expertise of researchers. Despite the emergence of Neural Architecture Search (NAS) as a promising technique for automating this process, current NAS methods still require human input to expand the search space and cannot generate new architectures. This paper explores the potential of Transformers in comprehending neural architectures and their performance, with the objective of establishing the foundation for utilizing Transformers to generate novel networks. We propose the Token-based Architecture Transformer (TART), which predicts neural network performance without the need to train candidate networks. TART attains state-of-the-art performance on the DeepNets-1M dataset for performance prediction tasks without edge information, indicating the potential of Transformers to aid in discovering novel and high-performing neural architectures.
在神经网络架构设计领域,实现高性能很大程度上依赖于研究人员的个人专业知识。尽管神经架构搜索(NAS)作为一种有望自动化的技术已经出现,目前的NAS方法仍然需要人类干预来扩展搜索空间,并且无法生成全新的架构。本文探讨了Transformer模型在理解神经架构及其性能方面的潜力,旨在为利用Transformer生成新颖网络奠定基础。我们提出了基于令牌的架构变换器(TART),该模型能够在无需训练候选网络的情况下预测神经网络的性能。TART在DeepNets-1M数据集上的性能预测任务中达到了最先进的水平,并且没有使用边信息,这表明了Transformers在帮助发现新颖和高性能的神经架构方面的潜力。
https://arxiv.org/abs/2501.02007
Event-based cameras are sensors that simulate the human eye, offering advantages such as high-speed robustness and low power consumption. Established Deep Learning techniques have shown effectiveness in processing event data. Chimera is a Block-Based Neural Architecture Search (NAS) framework specifically designed for Event-Based Object Detection, aiming to create a systematic approach for adapting RGB-domain processing methods to the event domain. The Chimera design space is constructed from various macroblocks, including Attention blocks, Convolutions, State Space Models, and MLP-mixer-based architectures, which provide a valuable trade-off between local and global processing capabilities, as well as varying levels of complexity. The results on the PErson Detection in Robotics (PEDRo) dataset demonstrated performance levels comparable to leading state-of-the-art models, alongside an average parameter reduction of 1.6 times.
基于事件的相机是模仿人眼工作的传感器,具有高速鲁棒性和低功耗等优点。已建立的深度学习技术在处理事件数据方面表现出色。Chimera 是一个块基神经架构搜索(NAS)框架,专门针对基于事件的目标检测设计,旨在为将RGB域的处理方法适应到事件域提供系统的方法。Chimera 的设计空间由各种宏块构建而成,包括注意模块、卷积层、状态空间模型和MLP-mixer 基础架构等,这些模块在局部与全局处理能力之间提供了有价值的权衡,并且具有不同的复杂度水平。在机器人中进行人物检测的PEDRo 数据集上的结果表明,Chimera 的性能达到了领先的状态-of-the-art 模型的水平,并且平均参数减少了1.6倍。
https://arxiv.org/abs/2412.19646
Designing effective neural architectures poses a significant challenge in deep learning. While Neural Architecture Search (NAS) automates the search for optimal architectures, existing methods are often constrained by predetermined search spaces and may miss critical neural architectures. In this paper, we introduce NADER (Neural Architecture Design via multi-agEnt collaboRation), a novel framework that formulates neural architecture design (NAD) as a LLM-based multi-agent collaboration problem. NADER employs a team of specialized agents to enhance a base architecture through iterative modification. Current LLM-based NAD methods typically operate independently, lacking the ability to learn from past experiences, which results in repeated mistakes and inefficient exploration. To address this issue, we propose the Reflector, which effectively learns from immediate feedback and long-term experiences. Additionally, unlike previous LLM-based methods that use code to represent neural architectures, we utilize a graph-based representation. This approach allows agents to focus on design aspects without being distracted by coding. We demonstrate the effectiveness of NADER in discovering high-performing architectures beyond predetermined search spaces through extensive experiments on benchmark tasks, showcasing its advantages over state-of-the-art methods. The codes will be released soon.
设计有效的神经网络架构在深度学习领域是一个重大挑战。虽然神经架构搜索(NAS)可以自动化寻找最优的架构,但现有的方法往往受限于预设的搜索空间,并可能错过关键的神经架构。在这篇论文中,我们引入了NADER(通过多智能体协作进行神经架构设计),这是一个将神经网络架构设计(NAD)视为基于大语言模型(LLM)的多代理协作问题的新框架。NADER采用了一组专门化的代理团队,通过迭代修改来增强基础架构。当前基于LLM的NAD方法通常独立运作,缺乏从过往经验中学习的能力,导致重复犯错和探索效率低下。为了解决这个问题,我们提出了反射器(Reflector),它能够有效地从即时反馈和长期经历中学习。 此外,不同于之前使用代码来表示神经架构的基于LLM的方法,我们采用了图基表示法。这种方法使代理可以专注于设计方面而不被编程任务所干扰。通过在基准任务上进行广泛的实验,我们展示了NADER发现超出预设搜索空间的高性能架构的有效性,并证明了其相较于当前最先进方法的优势。代码即将发布。
https://arxiv.org/abs/2412.19206
In this paper, we reveal the intrinsic duality between graph neural networks (GNNs) and evolutionary algorithms (EAs), bridging two traditionally distinct fields. Building on this insight, we propose Graph Neural Evolution (GNE), a novel evolutionary algorithm that models individuals as nodes in a graph and leverages designed frequency-domain filters to balance global exploration and local exploitation. Through the use of these filters, GNE aggregates high-frequency (diversity-enhancing) and low-frequency (stability-promoting) information, transforming EAs into interpretable and tunable mechanisms in the frequency domain. Extensive experiments on benchmark functions demonstrate that GNE consistently outperforms state-of-the-art algorithms such as GA, DE, CMA-ES, SDAES, and RL-SHADE, excelling in complex landscapes, optimal solution shifts, and noisy environments. Its robustness, adaptability, and superior convergence highlight its practical and theoretical value. Beyond optimization, GNE establishes a conceptual and mathematical foundation linking EAs and GNNs, offering new perspectives for both fields. Its framework encourages the development of task-adaptive filters and hybrid approaches for EAs, while its insights can inspire advances in GNNs, such as improved global information propagation and mitigation of oversmoothing. GNE's versatility extends to solving challenges in machine learning, including hyperparameter tuning and neural architecture search, as well as real-world applications in engineering and operations research. By uniting the dynamics of EAs with the structural insights of GNNs, this work provides a foundation for interdisciplinary innovation, paving the way for scalable and interpretable solutions to complex optimization problems.
在这篇论文中,我们揭示了图神经网络(GNN)与进化算法(EAs)之间的内在二元性,并将这两个传统上截然不同的领域联系起来。基于这一见解,我们提出了图神经演化(GNE),这是一种新型的进化算法,它将个体建模为图中的节点,并利用设计好的频域滤波器来平衡全局探索和局部开发。通过使用这些滤波器,GNE能够聚合高频(增强多样性)和低频(促进稳定性)信息,从而在频域中将EAs转化为可解释且可调的机制。对基准函数进行的广泛实验表明,GNE持续超越诸如GA、DE、CMA-ES、SDAES和RL-SHADE等最先进的算法,在复杂环境、最优解变动以及噪声环境中表现出色。其稳健性、适应性和卓越的收敛性能凸显了其实用价值与理论意义。除了优化之外,GNE还建立了一个连接EAs和GNNs的概念和数学基础,为这两个领域提供了新的视角。它的框架促进了针对特定任务自适应滤波器及混合方法的发展,并且它的见解可以激发GNNs的进步,比如改进全局信息传播并缓解过平滑问题。GNE的多功能性使其能够应对机器学习中的挑战,包括超参数调整和神经架构搜索,以及在工程和运筹学等实际应用场景中发挥作用。通过将EAs的动力与GNNs的结构洞见相结合,这项工作为跨学科创新奠定了基础,并开辟了通往解决复杂优化问题可扩展且易于解释方案的道路。
https://arxiv.org/abs/2412.17629
Existing efforts to boost multimodal fusion of 3D anomaly detection (3D-AD) primarily concentrate on devising more effective multimodal fusion strategies. However, little attention was devoted to analyzing the role of multimodal fusion architecture (topology) design in contributing to 3D-AD. In this paper, we aim to bridge this gap and present a systematic study on the impact of multimodal fusion architecture design on 3D-AD. This work considers the multimodal fusion architecture design at the intra-module fusion level, i.e., independent modality-specific modules, involving early, middle or late multimodal features with specific fusion operations, and also at the inter-module fusion level, i.e., the strategies to fuse those modules. In both cases, we first derive insights through theoretically and experimentally exploring how architectural designs influence 3D-AD. Then, we extend SOTA neural architecture search (NAS) paradigm and propose 3D-ADNAS to simultaneously search across multimodal fusion strategies and modality-specific modules for the first this http URL experiments show that 3D-ADNAS obtains consistent improvements in 3D-AD across various model capacities in terms of accuracy, frame rate, and memory usage, and it exhibits great potential in dealing with few-shot 3D-AD tasks.
现有的提升三维异常检测(3D-AD)多模态融合的努力主要集中在设计更有效的多模态融合策略上。然而,对分析多模态融合架构(拓扑结构)设计在促进3D-AD中的作用关注较少。本文旨在弥补这一差距,并系统性地研究多模态融合架构设计对3D-AD的影响。本工作从两个层面考虑多模态融合架构的设计:一是模块内融合,即独立的模态特定模块,涉及早期、中期或晚期的具体融合操作下的多模态特征;二是模块间融合,即如何将这些模块融合起来的策略。在这两种情况下,我们首先通过理论和实验探索来获得关于架构设计对3D-AD影响的见解。然后,我们将最先进的神经网络结构搜索(NAS)范式扩展,并提出3D-ADNAS以同时搜索多模态融合策略和模态特定模块。实验表明,3D-ADNAS在各种模型容量下,在准确率、帧率和内存使用方面对3D-AD持续改进,并且在处理少量样本的3D-AD任务中展现出巨大的潜力。
https://arxiv.org/abs/2412.17297
Learning curve extrapolation predicts neural network performance from early training epochs and has been applied to accelerate AutoML, facilitating hyperparameter tuning and neural architecture search. However, existing methods typically model the evolution of learning curves in isolation, neglecting the impact of neural network (NN) architectures, which influence the loss landscape and learning trajectories. In this work, we explore whether incorporating neural network architecture improves learning curve modeling and how to effectively integrate this architectural information. Motivated by the dynamical system view of optimization, we propose a novel architecture-aware neural differential equation model to forecast learning curves continuously. We empirically demonstrate its ability to capture the general trend of fluctuating learning curves while quantifying uncertainty through variational parameters. Our model outperforms current state-of-the-art learning curve extrapolation methods and pure time-series modeling approaches for both MLP and CNN-based learning curves. Additionally, we explore the applicability of our method in Neural Architecture Search scenarios, such as training configuration ranking.
学习曲线外推能够从神经网络早期训练周期中预测性能,并已被应用于加速自动化机器学习(AutoML),从而促进超参数调整和神经架构搜索。然而,现有的方法通常孤立地建模学习曲线的演变过程,忽视了神经网络(NN)架构的影响,后者会影响损失景观和学习轨迹。在这项工作中,我们探讨将神经网络架构纳入学习曲线建模是否能改善其效果以及如何有效地整合这种架构信息。受优化动力系统视角的启发,我们提出了一种新的架构感知型神经微分方程模型来连续预测学习曲线。实验证明了该模型能够捕捉波动的学习曲线的一般趋势,并通过变分参数量化不确定性。我们的模型在多层感知器(MLP)和卷积神经网络(CNN)基础的学习曲线外推方面,超过了当前最先进的方法和纯时间序列建模方法。此外,我们还探讨了此方法在神经架构搜索场景中的适用性,例如训练配置排名中应用的可行性。
https://arxiv.org/abs/2412.15554
Neural architecture search (NAS) enables finding the best-performing architecture from a search space automatically. Most NAS methods exploit an over-parameterized network (i.e., a supernet) containing all possible architectures (i.e., subnets) in the search space. However, the subnets that share the same set of parameters are likely to have different characteristics, interfering with each other during training. To address this, few-shot NAS methods have been proposed that divide the space into a few subspaces and employ a separate supernet for each subspace to limit the extent of weight sharing. They achieve state-of-the-art performance, but the computational cost increases accordingly. We introduce in this paper a novel few-shot NAS method that exploits the number of nonlinear functions to split the search space. To be specific, our method divides the space such that each subspace consists of subnets with the same number of nonlinear functions. Our splitting criterion is efficient, since it does not require comparing gradients of a supernet to split the space. In addition, we have found that dividing the space allows us to reduce the channel dimensions required for each supernet, which enables training multiple supernets in an efficient manner. We also introduce a supernet-balanced sampling (SBS) technique, sampling several subnets at each training step, to train different supernets evenly within a limited number of training steps. Extensive experiments on standard NAS benchmarks demonstrate the effectiveness of our approach. Our code is available at this https URL.
神经架构搜索(NAS)能够自动从搜索空间中找到表现最佳的架构。大多数NAS方法利用一个超参数化网络(即超网),该网络包含搜索空间中的所有可能架构(即子网)。然而,共享同一组参数的子网可能会有不同的特性,在训练过程中相互干扰。为了解决这个问题,有人提出了少量样本NAS方法,这些方法将空间划分为几个子空间,并为每个子空间使用单独的超网以限制权重共享的程度。它们达到了最先进的性能,但相应的计算成本也增加了。 本文介绍了一种新颖的少量样本NAS方法,利用非线性函数的数量来划分搜索空间。具体来说,我们的方法将空间划分为多个子空间,使得每个子空间包含具有相同数量非线性函数的子网。我们的划分标准是高效的,因为它不需要通过比较超网的梯度来分割空间。此外,我们发现,对空间进行划分使我们可以减少为每个超网所需的通道维度,从而能够在高效的方式下训练多个超网。我们还引入了一种平衡采样(SBS)技术,在每次训练步骤中采样几个子网,以在有限数量的训练步骤内均匀地训练不同的超网。标准NAS基准上的广泛实验展示了我们的方法的有效性。我们的代码可以在以下链接找到:[此 https URL]。
https://arxiv.org/abs/2412.14678
Understanding the deep meanings of the Qur'an and bridging the language gap between modern standard Arabic and classical Arabic is essential to improve the question-and-answer system for the Holy Qur'an. The Qur'an QA 2023 shared task dataset had a limited number of questions with weak model retrieval. To address this challenge, this work updated the original dataset and improved the model accuracy. The original dataset, which contains 251 questions, was reviewed and expanded to 629 questions with question diversification and reformulation, leading to a comprehensive set of 1895 categorized into single-answer, multi-answer, and zero-answer types. Extensive experiments fine-tuned transformer models, including AraBERT, RoBERTa, CAMeLBERT, AraELECTRA, and BERT. The best model, AraBERT-base, achieved a MAP@10 of 0.36 and MRR of 0.59, representing improvements of 63% and 59%, respectively, compared to the baseline scores (MAP@10: 0.22, MRR: 0.37). Additionally, the dataset expansion led to improvements in handling "no answer" cases, with the proposed approach achieving a 75% success rate for such instances, compared to the baseline's 25%. These results demonstrate the effect of dataset improvement and model architecture optimization in increasing the performance of QA systems for the Holy Qur'an, with higher accuracy, recall, and precision.
理解《古兰经》的深刻含义以及弥合现代标准阿拉伯语和古典阿拉伯语之间的语言差距,对于改进《圣训》问答系统至关重要。2023年共享任务数据集中,《古兰经》QA的问题数量有限且模型检索能力较弱。为应对这一挑战,本研究更新了原始数据集并提高了模型的准确性。该原始数据集包含251个问题,经过审查和扩展后增加到629个问题,并通过多样化和重新表述问题使其扩大至1895个,这些问题被分类为单答案、多答案和零答案类型。广泛的实验对变换器模型进行了微调,包括AraBERT、RoBERTa、CAMeLBERT、AraELECTRA和BERT。表现最佳的模型是AraBERT-base,在MAP@10上达到了0.36,在MRR上达到了0.59,分别比基线分数(MAP@10: 0.22, MRR: 0.37)提高了63%和59%。此外,数据集的扩展还改进了处理“无答案”情况的能力,所提出的方法对这类实例的成功率达到了75%,而基线的成功率为25%。这些结果表明,通过优化数据集和模型架构,可以提高《圣训》问答系统的性能,实现更高的准确性、召回率和精确度。
https://arxiv.org/abs/2412.11431
One-shot methods have significantly advanced the field of neural architecture search (NAS) by adopting weight-sharing strategy to reduce search costs. However, the accuracy of performance estimation can be compromised by co-adaptation. Few-shot methods divide the entire supernet into individual sub-supernets by splitting edge by edge to alleviate this issue, yet neglect relationships among edges and result in performance degradation on huge search space. In this paper, we introduce HEP-NAS, a hierarchy-wise partition algorithm designed to further enhance accuracy. To begin with, HEP-NAS treats edges sharing the same end node as a hierarchy, permuting and splitting edges within the same hierarchy to directly search for the optimal operation combination for each intermediate node. This approach aligns more closely with the ultimate goal of NAS. Furthermore, HEP-NAS selects the most promising sub-supernet after each segmentation, progressively narrowing the search space in which the optimal architecture may exist. To improve performance evaluation of sub-supernets, HEP-NAS employs search space mutual distillation, stabilizing the training process and accelerating the convergence of each individual sub-supernet. Within a given budget, HEP-NAS enables the splitting of all edges and gradually searches for architectures with higher accuracy. Experimental results across various datasets and search spaces demonstrate the superiority of HEP-NAS compared to state-of-the-art methods.
一发即中的方法通过采用权重共享策略来减少搜索成本,从而显著推进了神经架构搜索(NAS)领域的发展。然而,性能估计的准确性可能会因共同适应而受到损害。少发方法通过逐边分割整个超网络以缓解这一问题,并将超网络划分为单独的小超网络,但却忽略了边之间的关系,导致在巨大搜索空间中的表现下降。本文中,我们介绍了HEP-NAS,这是一种按层次划分的算法,旨在进一步提高准确性。首先,HEP-NAS将共享同一终点节点的边视为一个层级,在同一个层级内对边进行排列和分割,直接为每个中间节点搜索最优的操作组合。这种方法更符合NAS的最终目标。此外,HEP-NAS在每次分段后选择最有希望的小超网络,逐步缩小可能存在的最佳架构的搜索空间。为了改进小超网络的表现评估,HEP-NAS采用了搜索空间互蒸馏技术,稳定了训练过程并加速了每个独立小超网络的收敛速度。在给定预算内,HEP-NAS能够分割所有边,并逐渐寻找具有更高准确性的架构。实验结果表明,在不同的数据集和搜索空间中,HEP-NAS相比最先进的方法具有优越性。
https://arxiv.org/abs/2412.10723
Deep learning-based image generation has undergone a paradigm shift since 2021, marked by fundamental architectural breakthroughs and computational innovations. Through reviewing architectural innovations and empirical results, this paper analyzes the transition from traditional generative methods to advanced architectures, with focus on compute-efficient diffusion models and vision transformer architectures. We examine how recent developments in Stable Diffusion, DALL-E, and consistency models have redefined the capabilities and performance boundaries of image synthesis, while addressing persistent challenges in efficiency and quality. Our analysis focuses on the evolution of latent space representations, cross-attention mechanisms, and parameter-efficient training methodologies that enable accelerated inference under resource constraints. While more efficient training methods enable faster inference, advanced control mechanisms like ControlNet and regional attention systems have simultaneously improved generation precision and content customization. We investigate how enhanced multi-modal understanding and zero-shot generation capabilities are reshaping practical applications across industries. Our analysis demonstrates that despite remarkable advances in generation quality and computational efficiency, critical challenges remain in developing resource-conscious architectures and interpretable generation systems for industrial applications. The paper concludes by mapping promising research directions, including neural architecture optimization and explainable generation frameworks.
自2021年以来,基于深度学习的图像生成经历了范式的转变,这一转变以架构上的突破和计算创新为标志。通过回顾架构创新和实证结果,本文分析了从传统生成方法到先进架构的过渡,特别关注于计算高效的扩散模型和视觉变压器架构。我们探讨了Stable Diffusion、DALL-E和一致性模型等近期发展如何重新定义图像合成的能力和性能边界,同时解决效率和质量方面的持久挑战。我们的分析集中在潜在空间表示、交叉注意力机制以及参数高效训练方法的演进上,这些方法在资源受限的情况下实现了加速推理。虽然更高效的训练方法使推理速度加快,但诸如ControlNet和区域注意力系统等高级控制机制也同时提高了生成精度和内容定制化程度。我们还研究了增强的多模态理解和零样本生成能力如何重塑各行业的实际应用。分析表明,尽管在生成质量和计算效率方面取得了显著进展,但仍存在开发资源意识架构和可解释生成系统的重大挑战以满足工业应用需求。文章最后提出了有前景的研究方向,包括神经网络结构优化和可解释生成框架。
https://arxiv.org/abs/2412.09656
Condition monitoring of induction machines is crucial to prevent costly interruptions and equipment failure. Mechanical faults such as misalignment and rotor issues are among the most common problems encountered in industrial environments. To effectively monitor and detect these faults, a variety of sensors, including accelerometers, current sensors, temperature sensors, and microphones, are employed in the field. As a non-contact alternative, thermal imaging offers a powerful monitoring solution by capturing temperature variations in machines with thermal cameras. In this study, we propose using 2-dimensional Self-Organized Operational Neural Networks (Self-ONNs) to diagnose misalignment and broken rotor faults from thermal images of squirrel-cage induction motors. We evaluate our approach by benchmarking its performance against widely used Convolutional Neural Networks (CNNs), including ResNet, EfficientNet, PP-LCNet, SEMNASNet, and MixNet, using a Workswell InfraRed Camera (WIC). Our results demonstrate that Self-ONNs, with their non-linear neurons and self-organizing capability, achieve diagnostic performance comparable to more complex CNN models while utilizing a shallower architecture with just three operational layers. Its streamlined architecture ensures high performance and is well-suited for deployment on edge devices, enabling its use also in more complex multi-function and/or multi-device monitoring systems.
感应机器的状况监测对于预防昂贵的中断和设备故障至关重要。机械故障,如不对中和转子问题,在工业环境中是最常见的问题之一。为了有效监控并检测这些故障,现场使用了各种传感器,包括加速度计、电流传感器、温度传感器和麦克风。作为一种非接触式替代方案,热成像通过使用热相机捕捉机器中的温度变化,提供了强大的监测解决方案。在本研究中,我们建议采用二维自组织操作神经网络(Self-ONNs)来诊断笼型感应电动机的热图像中的不对中和转子断裂故障。我们的方法通过与广泛使用的卷积神经网络(CNNs),包括ResNet、EfficientNet、PP-LCNet、SENASNet和MixNet,使用Workswell红外相机(WIC)进行性能对比来评估其效果。结果表明,Self-ONNs利用其非线性神经元和自组织能力,在仅包含三个操作层的较浅架构中,实现了与更复杂的CNN模型相当的诊断性能。其简化的架构保证了高性能,并且适合在边缘设备上部署,使其也可用于更复杂的功能多样的监测系统或多设备监控系统。
https://arxiv.org/abs/2412.05901
Artificial intelligence (AI) is widely used in various fields including healthcare, autonomous vehicles, robotics, traffic monitoring, and agriculture. Many modern AI applications in these fields are multi-tasking in nature (i.e. perform multiple analysis on same data) and are deployed on resource-constrained edge devices requiring the AI models to be efficient across different metrics such as power, frame rate, and size. For these specific use-cases, in this work, we propose a new paradigm of neural network architecture (ILASH) that leverages a layer sharing concept for minimizing power utilization, increasing frame rate, and reducing model size. Additionally, we propose a novel neural network architecture search framework (ILASH-NAS) for efficient construction of these neural network models for a given set of tasks and device constraints. The proposed NAS framework utilizes a data-driven intelligent approach to make the search efficient in terms of energy, time, and CO2 emission. We perform extensive evaluations of the proposed layer shared architecture paradigm (ILASH) and the ILASH-NAS framework using four open-source datasets (UTKFace, MTFL, CelebA, and Taskonomy). We compare ILASH-NAS with AutoKeras and observe significant improvement in terms of both the generated model performance and neural search efficiency with up to 16x less energy utilization, CO2 emission, and training/search time.
https://arxiv.org/abs/2412.02116
Graph neural architecture search (GNAS) can customize high-performance graph neural network architectures for specific graph tasks or datasets. However, existing GNAS methods begin searching for architectures from a zero-knowledge state, ignoring the prior knowledge that may improve the search efficiency. The available knowledge base (e.g. NAS-Bench-Graph) contains many rich architectures and their multiple performance metrics, such as the accuracy (#Acc) and number of parameters (#Params). This study proposes exploiting such prior knowledge to accelerate the multi-objective evolutionary search on a new graph dataset, named knowledge-aware evolutionary GNAS (KEGNAS). KEGNAS employs the knowledge base to train a knowledge model and a deep multi-output Gaussian process (DMOGP) in one go, which generates and evaluates transfer architectures in only a few GPU seconds. The knowledge model first establishes a dataset-to-architecture mapping, which can quickly generate candidate transfer architectures for a new dataset. Subsequently, the DMOGP with architecture and dataset encodings is designed to predict multiple performance metrics for candidate transfer architectures on the new dataset. According to the predicted metrics, non-dominated candidate transfer architectures are selected to warm-start the multi-objective evolutionary algorithm for optimizing the #Acc and #Params on a new dataset. Empirical studies on NAS-Bench-Graph and five real-world datasets show that KEGNAS swiftly generates top-performance architectures, achieving 4.27% higher accuracy than advanced evolutionary baselines and 11.54% higher accuracy than advanced differentiable baselines. In addition, ablation studies demonstrate that the use of prior knowledge significantly improves the search performance.
图神经架构搜索(GNAS)能够为特定的图任务或数据集定制高性能的图神经网络结构。然而,现有的GNAS方法从零知识状态开始寻找架构,忽略了可能提高搜索效率的先验知识。可利用的知识库(如NAS-Bench-Graph)包含了许多丰富的架构及其多种性能指标,例如准确性(#Acc)和参数数量(#Params)。本研究提出了利用这种先验知识来加速针对新图数据集的多目标进化搜索的方法,并将其命名为知识感知进化GNAS(KEGNAS)。KEGNAS采用知识库一次性训练一个知识模型和一个深度多输出高斯过程(DMOGP),能够在几秒钟内生成并评估迁移架构。知识模型首先建立从数据集到架构的映射,能够快速为新数据集生成候选迁移架构。随后,设计了带有架构和数据集编码的DMOGP来预测候选迁移架构在新数据集上的多种性能指标。根据预测的指标,选择非支配候选迁移架构作为多目标进化算法的初始状态以优化新的数据集上的#Acc和#Params。基于NAS-Bench-Graph和五个实际世界的数据集的经验研究表明,KEGNAS能够快速生成高性能架构,其准确性比先进的进化基线高出4.27%,比先进可微分基线高出11.54%。此外,消融研究显示使用先验知识显著提升了搜索性能。
https://arxiv.org/abs/2411.17339
The evolution of previous Click-Through Rate (CTR) models has mainly been driven by proposing complex components, whether shallow or deep, that are adept at modeling feature interactions. However, there has been less focus on improving fusion design. Instead, two naive solutions, stacked and parallel fusion, are commonly used. Both solutions rely on pre-determined fusion connections and fixed fusion operations. It has been repetitively observed that changes in fusion design may result in different performances, highlighting the critical role that fusion plays in CTR models. While there have been attempts to refine these basic fusion strategies, these efforts have often been constrained to specific settings or dependent on specific components. Neural architecture search has also been introduced to partially deal with fusion design, but it comes with limitations. The complexity of the search space can lead to inefficient and ineffective results. To bridge this gap, we introduce OptFusion, a method that automates the learning of fusion, encompassing both the connection learning and the operation selection. We have proposed a one-shot learning algorithm tackling these tasks concurrently. Our experiments are conducted over three large-scale datasets. Extensive experiments prove both the effectiveness and efficiency of OptFusion in improving CTR model performance. Our code implementation is available here\url{this https URL}.
以往点击率(CTR)模型的发展主要依赖于提出复杂的组件,无论是浅层还是深层的,这些组件擅长建模特征交互。然而,在改进融合设计方面则关注较少。相反,通常使用两种简单的解决方案:堆叠式和并行式融合。这两种方案都依赖预先确定的融合连接和固定的融合操作。反复观察到,融合设计的变化可能会导致不同的性能表现,这突显了融合在CTR模型中的关键作用。尽管已经有人尝试改进这些基础的融合策略,但这些努力往往局限于特定场景或依赖特定组件。神经架构搜索也被引入来部分处理融合设计问题,但这同样存在局限性。搜索空间的复杂性可能导致效率低下和效果不佳的结果。为填补这一空白,我们提出了一种名为OptFusion的方法,它实现了对融合学习的自动化,包括连接学习和操作选择两个方面。我们提出了一个一次性学习算法来同步解决这些问题。我们的实验基于三个大规模数据集进行。广泛的实验证明了OptFusion在提升CTR模型性能方面的有效性和效率。我们的代码实现可以在这里获取\url{this https URL}。
https://arxiv.org/abs/2411.15731
Neural Architecture Search (NAS) continues to serve a key roll in the design and development of neural networks for task specific deployment. Modern NAS techniques struggle to deal with ever increasing search space complexity and compute cost constraints. Existing approaches can be categorized into two buckets: fine-grained computational expensive NAS and coarse-grained low cost NAS. Our objective is to craft an algorithm with the capability to perform fine-grain NAS at a low cost. We propose projecting the problem to a lower dimensional space through predicting the difference in accuracy of a pair of similar networks. This paradigm shift allows for reducing computational complexity from exponential down to linear with respect to the size of the search space. We present a strong mathematical foundation for our algorithm in addition to extensive experimental results across a host of common NAS Benchmarks. Our methods significantly out performs existing works achieving better performance coupled with a significantly higher sample efficiency.
神经架构搜索(NAS)继续在特定任务的神经网络设计和开发中扮演关键角色。现代NAS技术难以应对不断增长的搜索空间复杂度和计算成本限制。现有方法可以分为两大类:细粒度但计算昂贵的NAS,以及粗粒度但低成本的NAS。我们的目标是设计一种能够在低计算成本下进行细粒度NAS的算法。我们提出通过预测两个相似网络之间的准确性差异来将问题投影到一个更低维度的空间中。这种范式的转变使得相对于搜索空间大小而言,计算复杂性从指数级降低到了线性级别。除了广泛的实验结果外,我们的方法还提供了一个坚实的数学基础,在一系列常见的NAS基准测试中表现显著优于现有工作,并且具有更高的样本效率。
https://arxiv.org/abs/2411.14498
Dataset distillation aims to distill the knowledge of a large-scale real dataset into small yet informative synthetic data such that a model trained on it performs as well as a model trained on the full dataset. Despite recent progress, existing dataset distillation methods often struggle with computational efficiency, scalability to complex high-resolution datasets, and generalizability to deep architectures. These approaches typically require retraining when the distillation ratio changes, as knowledge is embedded in raw pixels. In this paper, we propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model by aligning rich representations extracted from real and generated images. The learned generative model can then produce informative training images for different distillation ratios and deep architectures. Extensive experiments on 15 datasets of varying resolutions show D2M's superior performance, re-distillation efficiency, and cross-architecture generalizability. Our method effectively scales up to high-resolution 128x128 ImageNet-1K. Furthermore, we verify D2M's practical benefits for downstream applications in neural architecture search.
数据集蒸馏旨在将大规模真实数据集的知识提炼成少量但信息丰富的合成数据,使得基于这些合成数据训练的模型能够达到与使用完整数据集训练的模型相同的性能。尽管最近取得了进展,现有的数据集蒸馏方法在计算效率、扩展到复杂高分辨率数据集以及对深度架构的一般性方面仍然面临挑战。通常情况下,当蒸馏比例发生变化时,这些方法需要重新训练,因为知识被嵌入到了原始像素中。本文提出了一种名为Data-to-Model Distillation(D2M)的新框架,通过将从真实图像和生成图像中提取的丰富表示对齐,将真实数据集的知识提炼到一个预训练生成模型的可学习参数中。然后,这个学得的生成模型可以为不同的蒸馏比例和深度架构生产信息丰富的训练图像。在15个不同分辨率的数据集上的广泛实验显示了D2M优越的性能、重新蒸馏效率以及跨架构的一般性。我们的方法有效地扩展到了高分辨率的128x128 ImageNet-1K数据集上。此外,我们验证了D2M对于神经网络结构搜索等下游应用的实际益处。
https://arxiv.org/abs/2411.12841
The accurate segmentation of retinal blood vessels plays a crucial role in the early diagnosis and treatment of various ophthalmic diseases. Designing a network model for this task requires meticulous tuning and extensive experimentation to handle the tiny and intertwined morphology of retinal blood vessels. To tackle this challenge, Neural Architecture Search (NAS) methods are developed to fully explore the space of potential network architectures and go after the most powerful one. Inspired by neuronal diversity which is the biological foundation of all kinds of intelligent behaviors in our brain, this paper introduces a novel and foundational approach to neural network design, termed ``neuron programming'', to automatically search neuronal types into a network to enhance a network's representation ability at the neuronal level, which is complementary to architecture-level enhancement done by NAS. Additionally, to mitigate the time and computational intensity of neuron programming, we develop a hypernetwork that leverages the search-derived architectural information to predict optimal neuronal configurations. Comprehensive experiments validate that neuron programming can achieve competitive performance in retinal blood segmentation, demonstrating the strong potential of neuronal diversity in medical image analysis.
视网膜血管的精确分割在各种眼科疾病的早期诊断和治疗中起着关键作用。为此任务设计网络模型需要仔细调整和广泛实验,以处理视网膜血管细微且错综复杂的形态。为解决这一挑战,开发了神经架构搜索(NAS)方法,全面探索潜在网络架构的空间,并追求最强大的一种。受大脑中各种智能行为的生物学基础——神经元多样性启发,本文提出了一种新颖的基础性神经网络设计方法,称为“神经元编程”,通过自动搜索并集成不同类型的神经元到网络中来增强网络在神经元层面的表现能力,这与NAS在网络架构层面上的优化是互补的。此外,为了减轻神经元编程的时间和计算密集度,我们开发了一个超网络,利用搜索得出的架构信息预测最优的神经元配置。全面实验验证了神经元编程可以在视网膜血管分割中实现具有竞争力的表现,展示了神经元多样性在医学图像分析中的强大潜力。
https://arxiv.org/abs/2411.11110
Deep learning has revolutionized computing in many real-world applications, arguably due to its remarkable performance and extreme convenience as an end-to-end solution. However, deep learning models can be costly to train and to use, especially for those large-scale models, making it necessary to optimize the original overly complicated models into smaller ones in scenarios with limited resources such as mobile applications or simply for resource saving. The key question in such model optimization is, how can we effectively identify and measure the redundancy in a deep learning model structure. While several common metrics exist in the popular model optimization techniques to measure the performance of models after optimization, they are not able to quantitatively inform the degree of remaining redundancy. To address the problem, we present a novel testing approach, i.e., RedTest, which proposes a novel testing metric called Model Structural Redundancy Score (MSRS) to quantitatively measure the degree of redundancy in a deep learning model structure. We first show that MSRS is effective in both revealing and assessing the redundancy issues in many state-of-the-art models, which urgently calls for model optimization. Then, we utilize MSRS to assist deep learning model developers in two practical application scenarios: 1) in Neural Architecture Search, we design a novel redundancy-aware algorithm to guide the search for the optimal model structure and demonstrate its effectiveness by comparing it to existing standard NAS practice; 2) in the pruning of large-scale pre-trained models, we prune the redundant layers of pre-trained models with the guidance of layer similarity to derive less redundant ones of much smaller size. Extensive experimental results demonstrate that removing such redundancy has a negligible effect on the model utility.
深度学习在许多实际应用中彻底改变了计算方式,这主要归功于其出色的性能和作为端到端解决方案的极大便利性。然而,训练和使用深度学习模型的成本可能很高,特别是对于那些大规模模型而言,在移动应用程序等资源有限的情况下或仅为了节约资源,需要将原始过于复杂的模型优化成较小的模型。在这种模型优化中,关键问题是如何有效地识别并衡量深度学习模型结构中的冗余。 虽然在流行的模型优化技术中存在几种常见的度量标准来测量优化后模型的性能,但它们无法定量地告知剩余冗余的程度。为了解决这一问题,我们提出了一种新颖的测试方法,即RedTest,它引入了一个新的测试指标——模型结构冗余得分(MSRS),以定量地衡量深度学习模型结构中的冗余程度。首先,我们展示了MSRS在揭示和评估许多先进模型中冗余问题方面的有效性,这些问题迫切需要进行模型优化。然后,我们将MSRS用于辅助深度学习模型开发者在两种实际应用场景中:1)神经架构搜索(Neural Architecture Search),设计了一种新颖的冗余感知算法来指导寻找最优模型结构,并通过与现有标准NAS实践比较展示了其有效性;2)大规模预训练模型剪枝,在层相似性的指导下,对预训练模型中的冗余层进行剪枝,以获得更小但冗余度更低的新模型。广泛的实验结果表明,去除这种冗余对于模型的实用性几乎没有影响。
https://arxiv.org/abs/2411.10507
In the past few years, channel-wise and spatial-wise attention blocks have been widely adopted as supplementary modules in deep neural networks, enhancing network representational abilities while introducing low complexity. Most attention modules follow a squeeze-and-excitation paradigm. However, to design such attention modules, requires a substantial amount of experiments and computational resources. Neural Architecture Search (NAS), meanwhile, is able to automate the design of neural networks and spares the numerous experiments required for an optimal architecture. This motivates us to design a search architecture that can automatically find near-optimal attention modules through NAS. We propose SASE, a Searching Architecture for Squeeze and Excitation operations, to form a plug-and-play attention block by searching within certain search space. The search space is separated into 4 different sets, each corresponds to the squeeze or excitation operation along the channel or spatial dimension. Additionally, the search sets include not only existing attention blocks but also other operations that have not been utilized in attention mechanisms before. To the best of our knowledge, SASE is the first attempt to subdivide the attention search space and search for architectures beyond currently known attention modules. The searched attention module is tested with extensive experiments across a range of visual tasks. Experimental results indicate that visual backbone networks (ResNet-50/101) using the SASE attention module achieved the best performance compared to those using the current state-of-the-art attention modules. Codes are included in the supplementary material, and they will be made public later.
在过去几年里,通道注意力模块和空间注意力模块已被广泛用作深度神经网络中的补充模块,在引入较低复杂度的同时增强了网络的表示能力。大多数注意力模块遵循挤压与激励(squeeze-and-excitation)范式。然而,设计此类注意力模块需要大量的实验和计算资源。同时,神经架构搜索(Neural Architecture Search, NAS)能够自动设计神经网络,并省去了寻找最优架构所需的大量实验。这促使我们设计一种通过NAS自动找到接近最优的注意力模块的搜索架构。为此,我们提出了SASE(Searching Architecture for Squeeze and Excitation operations),它通过在特定的搜索空间内进行搜索形成一个即插即用的注意力块。搜索空间被分为4个不同的集合,每个集合对应于通道或空间维度上的挤压或激励操作。此外,搜索集不仅包括现有的注意力模块,还包括之前未用于注意力机制中的其他操作。据我们所知,SASE是首次尝试细分注意力搜索空间并搜索超出当前已知注意力模块的架构。通过广泛的实验,在一系列视觉任务上测试了搜寻到的注意力模块。实验结果表明,使用SASE注意力模块的视觉骨干网络(ResNet-50/101)相较于那些使用现有最先进注意力模块的网络取得了最佳性能。代码包括在补充材料中,并将在稍后公开。
https://arxiv.org/abs/2411.08333