The present study covers an approach to neural architecture search (NAS) using Cartesian genetic programming (CGP) for the design and optimization of Convolutional Neural Networks (CNNs). In designing artificial neural networks, one crucial aspect of the innovative approach is suggesting a novel neural architecture. Currently used architectures have mostly been developed manually by human experts, which is a time-consuming and error-prone process. In this work, we use pure Genetic Programming Approach to design CNNs, which employs only one genetic operation, i.e., mutation. In the course of preliminary experiments, our methodology yields promising results.
本研究探讨了使用二维遗传编程(CGP)方法来搜索神经架构(NAS)以优化卷积神经网络(CNN)的设计。在设计人工神经网络时,创新方法的关键方面是提出一种新的神经网络架构。目前使用的架构主要是通过手动设计开发的,这是耗时且容易出错的过程。在这项研究中,我们使用纯遗传编程方法来设计CNN,该方法只使用一种遗传操作,即突变。在初步实验期间,我们的方法得到了积极的结果。
https://arxiv.org/abs/2410.00129
Neural Architecture Search (NAS) automates neural network design, reducing dependence on human expertise. While NAS methods are computationally intensive and dataset-specific, auxiliary predictors reduce the models needing training, decreasing search time. This strategy is used to generate architectures satisfying multiple computational constraints. Recently, Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent. In this field, DiffusionNAG is a state-of-the-art method. This diffusion-based approach streamlines computation, generating architectures optimized for accuracy on unseen datasets without further adaptation. However, by focusing solely on accuracy, DiffusionNAG overlooks other crucial objectives like model complexity, computational efficiency, and inference latency -- factors essential for deploying models in resource-constrained environments. This paper introduces the Pareto-Optimal Many-Objective Neural Architecture Generator (POMONAG), extending DiffusionNAG via a many-objective diffusion process. POMONAG simultaneously considers accuracy, number of parameters, multiply-accumulate operations (MACs), and inference latency. It integrates Performance Predictor models to estimate these metrics and guide diffusion gradients. POMONAG's optimization is enhanced by expanding its training Meta-Dataset, applying Pareto Front Filtering, and refining embeddings for conditional generation. These enhancements enable POMONAG to generate Pareto-optimal architectures that outperform the previous state-of-the-art in performance and efficiency. Results were validated on two search spaces -- NASBench201 and MobileNetV3 -- and evaluated across 15 image classification datasets.
Neural Architecture Search (NAS) 自动化神经网络设计,减少了人们对专家技术的依赖。虽然NAS方法计算密集型且对数据集特定,辅助预测器减少了需要训练的模型数量,从而降低了搜索时间。这种策略用于生成满足多个计算约束的架构。最近,Transferable NAS emergence,从数据集依赖的搜索过程发展到任务依赖的搜索过程。在这个领域,DiffusionNAG是当前最先进的。这种扩散为基础的方法简化了计算,为未见过的数据集生成优化的架构,而无需进一步调整。然而,由于仅关注准确性,DiffusionNAG忽视了其他关键目标,如模型复杂性、计算效率和推理延迟——这些对于在资源受限的环境中部署模型至关重要。本文介绍了Pareto-Optimal Many-Objective Neural Architecture Generator (POMONAG),通过多个目标扩散过程将DiffusionNAG扩展。POMONAG同时考虑准确性、参数数量、多级累积操作(MACs)和推理延迟。它将性能预测器模型集成到其中,估计这些指标并引导扩散梯度。POMONAG通过扩展训练元数据集、应用Pareto前沿过滤和优化嵌入条件生成来进行优化。这些改进使POMONAG能够生成在性能和效率上优于现有技术的最优架构。结果在NASBench201和MobileNetV3上进行了验证,并评估了15个图像分类数据集。
https://arxiv.org/abs/2409.20447
The neurological condition known as cerebral palsy (CP) first manifests in infancy or early childhood and has a lifelong impact on motor coordination and body movement. CP is one of the leading causes of childhood disabilities, and early detection is crucial for providing appropriate treatment. However, such detection relies on assessments by human experts trained in methods like general movement assessment (GMA). These are not widely accessible, especially in developing countries. Conventional machine learning approaches offer limited predictive performance on CP detection tasks, and the approaches developed by the few available domain experts are generally dataset-specific, restricting their applicability beyond the context for which these were created. To address these challenges, we propose a neural architecture search (NAS) algorithm applying a reinforcement learning update scheme capable of efficiently optimizing for the best architectural and hyperparameter combination to discover the most suitable neural network configuration for detecting CP. Our method performs better on a real-world CP dataset than other approaches in the field, which rely on large ensembles. As our approach is less resource-demanding and performs better, it is particularly suitable for implementation in resource-constrained settings, including rural or developing areas with limited access to medical experts and the required diagnostic tools. The resulting model's lightweight architecture and efficient computation time allow for deployment on devices with limited processing power, reducing the need for expensive infrastructure, and can, therefore, be integrated into clinical workflows to provide timely and accurate support for early CP diagnosis.
脑性瘫痪(CP)是一种在婴儿或早期童年时期首次出现的神经系统疾病,并对其终身运动协调和身体运动能力产生影响。CP是儿童残疾的主要原因之一,因此早期诊断至关重要。然而,这种诊断依赖于经过训练的人类专家使用的方法进行评估。这些方法并不广泛可用,尤其是在发展中国家。传统的机器学习方法在CP检测任务上的预测表现有限,而由少数领域专家开发的方法通常仅适用于其创建的数据集,限制了它们的适用范围。为了应对这些挑战,我们提出了一个神经架构搜索(NAS)算法,该算法应用了一种强化学习更新方案,能够高效地优化最佳架构和超参数组合以发现最适合检测CP的神经网络配置。我们的方法在真实世界CP数据集上的表现优于其他方法,这些方法依赖于大的集成。由于我们的方法资源需求较少,并且表现更好,因此特别适用于在资源受限的环境中实施,包括农村或发展地区,这些地区医疗专家的接触有限,并且缺乏所需的诊断工具。最终模型的轻量级架构和高效的计算时间允许在处理能力有限的设备上进行部署,减少昂贵的设施的需求,因此可以集成到临床工作流程中,为早期CP诊断提供及时和准确的支持。
https://arxiv.org/abs/2409.20060
Bayesian optimization (BO) is a powerful framework to optimize black-box expensive-to-evaluate functions via sequential interactions. In several important problems (e.g. drug discovery, circuit design, neural architecture search, etc.), though, such functions are defined over large $\textit{combinatorial and unstructured}$ spaces. This makes existing BO algorithms not feasible due to the intractable maximization of the acquisition function over these domains. To address this issue, we propose $\textbf{GameOpt}$, a novel game-theoretical approach to combinatorial BO. $\textbf{GameOpt}$ establishes a cooperative game between the different optimization variables, and selects points that are game $\textit{equilibria}$ of an upper confidence bound acquisition function. These are stable configurations from which no variable has an incentive to deviate$-$ analog to local optima in continuous domains. Crucially, this allows us to efficiently break down the complexity of the combinatorial domain into individual decision sets, making $\textbf{GameOpt}$ scalable to large combinatorial spaces. We demonstrate the application of $\textbf{GameOpt}$ to the challenging $\textit{protein design}$ problem and validate its performance on four real-world protein datasets. Each protein can take up to $20^{X}$ possible configurations, where $X$ is the length of a protein, making standard BO methods infeasible. Instead, our approach iteratively selects informative protein configurations and very quickly discovers highly active protein variants compared to other baselines.
贝叶斯优化(BO)是一种通过序列交互优化黑色盒子的昂贵函数的强大框架。在几个重要问题(如药物发现、电路设计、神经架构搜索等)中,尽管这些函数定义在大型的$\textit{组合和无结构}$空间中,但实际应用中,这些空间是难以求解的最大化问题。因此,现有的BO算法在这种情况下是不可行的,因为这些领域中收购函数的求解是无限大的。为解决这个问题,我们提出了$\textbf{GameOpt}$,一种新颖的游戏理论方法,用于求解组合BO。$\textbf{GameOpt}$建立了一个合作游戏,将不同的优化变量之间联系起来,并选择具有上位信心度界收购函数游戏平衡的点。这些是稳定的配置,在这些领域中,没有变量有动机偏离$-$类似于连续领域的局部最优解。关键的是,这使得我们能够将组合域的复杂性分解为单个决策集,从而使$\textbf{GameOpt}$具有对大型组合空间的扩展性。我们用$\textbf{GameOpt}$解决了具有挑战性的蛋白质设计问题,并在四个真实世界蛋白质数据集上验证了其性能。每个蛋白质可以有高达$20^X$种配置,其中$X$是蛋白质的长度,使得标准BO方法不可行。相反,我们的方法通过迭代选择有信息的蛋白质配置,并与其他基线相比,很快就发现了高度活跃的蛋白质变体。
https://arxiv.org/abs/2409.18582
The automation of feature extraction of machine learning has been successfully realized by the explosive development of deep learning. However, the structures and hyperparameters of deep neural network architectures also make huge difference on the performance in different tasks. The process of exploring optimal structures and hyperparameters often involves a lot of tedious human intervene. As a result, a legitimate question is to ask for the automation of searching for optimal network structures and hyperparameters. The work of automation of exploring optimal hyperparameters is done by Hyperparameter Optimization. Neural Architecture Search is aimed to automatically find the best network structure given specific tasks. In this paper, we firstly introduced the overall development of Neural Architecture Search and then focus mainly on providing an overall and understandable survey about Neural Architecture Search works that are relevant with reinforcement learning, including improvements and variants based on the hope of satisfying more complex structures and resource-insufficient environment.
机器学习特征提取的自动化成功实现是通过深度学习的爆炸式发展实现的。然而,深度神经网络架构的结构和超参数也对不同任务的性能产生巨大影响。探索最优结构和超参数的过程通常需要大量无聊的人干预。因此,一个合理的疑问是要求自动化搜索最优网络结构和超参数。自动化探索最优超参数的工作是由Hyperparameter Optimization完成的。神经网络架构搜索旨在自动找到特定任务下的最佳网络结构。在本文中,我们首先介绍了神经网络架构搜索的整体发展,然后主要关注提供与强化学习相关的神经网络架构搜索工作的一般性和可理解性调查,包括基于满足更复杂结构和资源不足环境的改进和变体。
https://arxiv.org/abs/2409.18163
Diffusion models are cutting-edge generative models adept at producing diverse, high-quality images. Despite their effectiveness, these models often require significant computational resources owing to their numerous sequential denoising steps and the significant inference cost of each step. Recently, Neural Architecture Search (NAS) techniques have been employed to automatically search for faster generation processes. However, NAS for diffusion is inherently time-consuming as it requires estimating thousands of diffusion models to search for the optimal one. In this paper, we introduce Flexiffusion, a novel training-free NAS paradigm designed to accelerate diffusion models by concurrently optimizing generation steps and network structures. Specifically, we partition the generation process into isometric step segments, each sequentially composed of a full step, multiple partial steps, and several null steps. The full step computes all network blocks, while the partial step involves part of the blocks, and the null step entails no computation. Flexiffusion autonomously explores flexible step combinations for each segment, substantially reducing search costs and enabling greater acceleration compared to the state-of-the-art (SOTA) method for diffusion models. Our searched models reported speedup factors of $2.6\times$ and $1.5\times$ for the original LDM-4-G and the SOTA, respectively. The factors for Stable Diffusion V1.5 and the SOTA are $5.1\times$ and $2.0\times$. We also verified the performance of Flexiffusion on multiple datasets, and positive experiment results indicate that Flexiffusion can effectively reduce redundancy in diffusion models.
扩散模型是 cutting-edge 的生成模型,擅长生成多样、高质量的图像。尽管这些模型非常有效,但它们通常需要大量的计算资源,因为它们需要进行多个序列去噪步骤,并且每一步的推理成本都相当高。最近,神经架构搜索(NAS)技术被用于自动寻找更快的生成过程。然而,为扩散模型使用NAS始终是耗时的,因为它需要估算成千上万个扩散模型来寻找最优的一个。在本文中,我们介绍了 Flexifusion,一种新型的无需训练的自NAS范式,旨在通过同时优化生成步骤和网络结构来加速扩散模型。具体来说,我们将生成过程划分为等距的步骤段,每个步骤段都是依次由一个完整的步骤、多个部分步骤和一个零步骤组成的。完整的步骤计算所有的网络块,部分步骤包括部分块,零步骤包括不进行计算。Flexifusion 自主探索每个步骤段的灵活组合,大幅减少了搜索成本,并相比于最先进的扩散模型方法实现了更大的加速。我们的搜索模型在原始 LDM-4-G 和 SOTA 上的速度提升因子分别为 2.6 和 1.5。Stable Diffusion V1.5 和 SOTA 的因子分别为 5.1 和 2.0。我们还验证了 Flexifusion 在多个数据集上的性能,并且实验结果表明,Flexifusion 能有效减少扩散模型的冗余。
https://arxiv.org/abs/2409.17566
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research. Numerous efficient techniques, including weight pruning, quantization, and distillation, have been embraced to compress LLMs, targeting memory reduction and inference acceleration, which underscore the redundancy in LLMs. However, most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures. Besides, traditional architecture search methods, limited by the elevated complexity with extensive parameters, struggle to demonstrate their effectiveness on LLMs. In this paper, we propose a training-free architecture search framework to identify optimal subnets that preserve the fundamental strengths of the original LLMs while achieving inference acceleration. Furthermore, after generating subnets that inherit specific weights from the original LLMs, we introduce a reformation algorithm that utilizes the omitted weights to rectify the inherited weights with a small amount of calibration data. Compared with SOTA training-free structured pruning works that can generate smaller networks, our method demonstrates superior performance across standard benchmarks. Furthermore, our generated subnets can directly reduce the usage of GPU memory and achieve inference acceleration.
大语言模型(LLMs)在人工智能研究领域拥有广泛的影响力。为了压缩LLMs,包括权重修剪、量化技术和去梯度,已经采用了多种有效方法。这些方法的目标是降低内存使用并提高推理加速,这凸显了LLMs中的冗余。然而,大多数模型压缩技术都集中在权重优化上,而忽略了探索最优架构。此外,传统的架构搜索方法,由于具有广泛参数的复杂性,很难在LLMs上展示其有效性。在本文中,我们提出了一个无需训练的架构搜索框架,用于识别保留原始LLM基本优势的最佳子网络,同时实现推理加速。此外,在生成具有特定权重的子网络之后,我们引入了一种平滑算法,利用忽略的权重来纠正继承的权重,并且需要很少的调参数据。与具有更小网络规模的训练-free结构化剪枝工作相比,我们的方法在标准基准测试中的性能优越。此外,我们生成的子网络可以直接减少GPU内存使用并实现推理加速。
https://arxiv.org/abs/2409.17372
Due to intensive genetic selection for rapid growth rates and high broiler yields in recent years, the global poultry industry has faced a challenging problem in the form of woody breast (WB) conditions. This condition has caused significant economic losses as high as $200 million annually, and the root cause of WB has yet to be identified. Human palpation is the most common method of distinguishing a WB from others. However, this method is time-consuming and subjective. Hyperspectral imaging (HSI) combined with machine learning algorithms can evaluate the WB conditions of fillets in a non-invasive, objective, and high-throughput manner. In this study, 250 raw chicken breast fillet samples (normal, mild, severe) were taken, and spatially heterogeneous hardness distribution was first considered when designing HSI processing models. The study not only classified the WB levels from HSI but also built a regression model to correlate the spectral information with sample hardness data. To achieve a satisfactory classification and regression model, a neural network architecture search (NAS) enabled a wide-deep neural network model named NAS-WD, which was developed. In NAS-WD, NAS was first used to automatically optimize the network architecture and hyperparameters. The classification results show that NAS-WD can classify the three WB levels with an overall accuracy of 95%, outperforming the traditional machine learning model, and the regression correlation between the spectral data and hardness was 0.75, which performs significantly better than traditional regression models.
由于近年来快速生长速率和高鸡饲养产量的选择压力,全球家禽业面临了一个具有挑战性的问题,即木质心边(WB)状况。这种状况每年给产业带来高达200亿美元的经济损失,而WB的根源尚未被确定。人触诊是最常见的区分WB与其他情况的诊断方法。然而,这种方法耗时且主观。结合超光谱成像(HSI)与机器学习算法,可以非侵入性地、客观且高效率地评估鸡胸肉片中的WB状况。在本研究中,取了250个原始鸡肉胸肉样品(正常、轻度、严重),在设计HSI处理模型时首先考虑了空间异质硬度分布。研究不仅对HSI中的WB水平进行了分类,还构建了一个回归模型来将光谱信息与样品硬度数据相关联。为了实现令人满意的分类和回归模型,神经网络架构搜索(NAS)使得名为NAS-WD的广泛深度神经网络模型得以开发。在NAS-WD中,NAS首先用于自动优化网络架构和超参数。分类结果显示,NAS-WD可以对三个WB水平进行分类,总体准确率为95%,超过传统机器学习模型,且光谱数据的回归相关系数为0.75,远优于传统回归模型。
https://arxiv.org/abs/2409.17210
Spatio-temporal forecasting is a critical component of various smart city applications, such as transportation optimization, energy management, and socio-economic analysis. Recently, several automated spatio-temporal forecasting methods have been proposed to automatically search the optimal neural network architecture for capturing complex spatio-temporal dependencies. However, the existing automated approaches suffer from expensive neural architecture search overhead, which hinders their practical use and the further exploration of diverse spatio-temporal operators in a finer granularity. In this paper, we propose AutoSTF, a decoupled automatic neural architecture search framework for cost-effective automated spatio-temporal forecasting. From the efficiency perspective, we first decouple the mixed search space into temporal space and spatial space and respectively devise representation compression and parameter-sharing schemes to mitigate the parameter explosion. The decoupled spatio-temporal search not only expedites the model optimization process but also leaves new room for more effective spatio-temporal dependency modeling. From the effectiveness perspective, we propose a multi-patch transfer module to jointly capture multi-granularity temporal dependencies and extend the spatial search space to enable finer-grained layer-wise spatial dependency search. Extensive experiments on eight datasets demonstrate the superiority of AutoSTF in terms of both accuracy and efficiency. Specifically, our proposed method achieves up to 13.48x speed-up compared to state-of-the-art automatic spatio-temporal forecasting methods while maintaining the best forecasting accuracy.
空间时序预测是各种智能城市应用程序的关键组件,如交通优化、能源管理和社会经济分析。最近,提出了几种自动空间时序预测方法,用于自动搜索最优神经网络架构,以捕捉复杂的空间时序依赖关系。然而,现有的自动方法在神经网络架构搜索过程中存在昂贵的成本,这阻碍了它们的实际应用和在更高粒度上对多种空间时序操作的深入探索。在本文中,我们提出了AutoSTF,一种用于经济高效自动空间时序预测的解耦自动神经网络架构搜索框架。从效率角度来看,我们首先将混合搜索空间解耦为时间和空间,并分别设计表示压缩和参数共享方案,以减轻参数爆炸。解耦的空间时序搜索不仅加速了模型优化过程,而且为更有效的空间时序依赖建模留下了新的空间。从效果角度来看,我们提出了一个多补丁传输模块,共同捕获多粒度时间依赖关系,并将空间搜索空间扩展,以实现细粒度层间空间依赖搜索。在八个数据集上的广泛实验证明,AutoSTF在准确性和效率方面都具有优越性。具体来说,与最先进的自动空间时序预测方法相比,我们的方法实现了高达13.48倍的速度提升,同时保持了最佳的预测准确性。
https://arxiv.org/abs/2409.16586
Generating 3D human gestures and speech from a text script is critical for creating realistic talking avatars. One solution is to leverage separate pipelines for text-to-speech (TTS) and speech-to-gesture (STG), but this approach suffers from poor alignment of speech and gestures and slow inference times. In this paper, we introduce FastTalker, an efficient and effective framework that simultaneously generates high-quality speech audio and 3D human gestures at high inference speeds. Our key insight is reusing the intermediate features from speech synthesis for gesture generation, as these features contain more precise rhythmic information than features re-extracted from generated speech. Specifically, 1) we propose an end-to-end framework that concurrently generates speech waveforms and full-body gestures, using intermediate speech features such as pitch, onset, energy, and duration directly for gesture decoding; 2) we redesign the causal network architecture to eliminate dependencies on future inputs for real applications; 3) we employ Reinforcement Learning-based Neural Architecture Search (NAS) to enhance both performance and inference speed by optimizing our network architecture. Experimental results on the BEAT2 dataset demonstrate that FastTalker achieves state-of-the-art performance in both speech synthesis and gesture generation, processing speech and gestures in 0.17 seconds per second on an NVIDIA 3090.
从文本脚本中生成3D人类手势和 speech 是一种关键方法,以创建真实的聊天虚拟助手。一种解决方案是利用分别处理文本到语音(TTS)和语音到手势(STG)的单独管道,但这种方法存在语音和手势同步不良和推理速度较慢的问题。在本文中,我们介绍了 FastTalker,一种高效且有效的框架,可以在高推理速度下同时生成高质量的 speech 音频和 3D 人类手势。我们的关键洞见是重用 speech synthesis 中的中间特征进行手势生成,因为这些特征包含比从生成的 speech 中提取的特征更精确的节奏信息。具体来说,1)我们提出了一个端到端的框架,使用中间 speech 特征(如 pitch、onset、energy 和 duration)同时生成语音波形和全身手势;2)我们重新设计了因果网络架构,以消除对真实应用未来输入的依赖;3)我们采用强化学习为基础的神经网络架构搜索(NAS)来通过优化网络架构提高性能和推理速度。在 BEAT2 数据集上的实验结果表明,FastTalker 在语音合成和手势生成方面都实现了最先进的性能,处理语音和手势的时间为每秒 0.17 秒。
https://arxiv.org/abs/2409.16404
Inference-time techniques are emerging as highly effective tools to increase large language model (LLM) capabilities. However, there is still limited understanding of the best practices for developing systems that combine inference-time techniques with one or more LLMs, with challenges including: (1) effectively allocating inference compute budget, (2) understanding the interactions between different combinations of inference-time techniques and their impact on downstream performance, and 3) efficiently searching over the large space of model choices, inference-time techniques, and their compositions. To address these challenges, we introduce Archon, an automated framework for designing inference-time architectures. Archon defines an extensible design space, encompassing methods such as generation ensembling, multi-sampling, ranking, fusion, critiquing, verification, and unit testing. It then transforms the problem of selecting and combining LLMs and inference-time techniques into a hyperparameter optimization objective. To optimize this objective, we introduce automated Inference-Time Architecture Search (ITAS) algorithms. Given target benchmark(s), an inference compute budget, and available LLMs, ITAS outputs optimized architectures. We evaluate Archon architectures across a wide range of instruction-following and reasoning benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. We show that automatically designed inference-time architectures by Archon outperform strong models such as GPT-4o and Claude 3.5 Sonnet on these benchmarks, achieving an average increase of 14.1 and 10.3 percentage points with all-source models and open-source models, respectively. We make our code and datasets available publicly on Github: this https URL.
推理时间技术正在成为提高大型语言模型(LLM)能力的高效工具。然而,对于将推理时间技术与一个或多个LLM相结合的开发系统的最佳实践仍存在有限的了解,包括:(1)有效分配推理计算预算,(2)理解不同推理时间技术与其对下游性能的影响之间的交互作用,以及(3)高效地搜索模型选择、推理时间技术和它们的组合的大型空间。为了应对这些挑战,我们介绍了Archon,一个自动设计推理时间架构的框架。Archon定义了一个可扩展的设计空间,包括生成集成、多采样、排名、融合、批判性分析、验证和单元测试等方法。然后,它将选择和组合LLM和推理时间技术的问题转化为一个超参数优化目标。为了优化这个目标,我们介绍了自动推理时间架构搜索(ITAS)算法。给定目标基准(或多个)、推理计算预算和可用的LLM,ITAS输出优化架构。我们在包括MT-Bench、Arena-Hard-Auto、AlpacaEval 2.0、MixEval、MixEval Hard、MATH和CodeContests在内的广泛指令跟随和推理基准上评估Archon架构。我们证明了由Archon自动设计的推理时间架构在基准上优于强大的模型,如GPT-4o和Claude 3.5 Sonnet,实现了所有源模型和开源模型的平均增长14.1%和10.3%。我们将我们的代码和数据公开发布在Github上:此链接。
https://arxiv.org/abs/2409.15254
Differentiable architecture search (DARTS) has emerged as a promising technique for effective neural architecture search, and it mainly contains two steps to find the high-performance architecture: First, the DARTS supernet that consists of mixed operations will be optimized via gradient descent. Second, the final architecture will be built by the selected operations that contribute the most to the supernet. Although DARTS improves the efficiency of NAS, it suffers from the well-known degeneration issue which can lead to deteriorating architectures. Existing works mainly attribute the degeneration issue to the failure of its supernet optimization, while little attention has been paid to the selection method. In this paper, we cease to apply the widely-used magnitude-based selection method and propose a novel criterion based on operation strength that estimates the importance of an operation by its effect on the final loss. We show that the degeneration issue can be effectively addressed by using the proposed criterion without any modification of supernet optimization, indicating that the magnitude-based selection method can be a critical reason for the instability of DARTS. The experiments on NAS-Bench-201 and DARTS search spaces show the effectiveness of our method.
差分架构搜索(DARTS)作为一种有效的神经架构搜索技术,主要包括两个步骤来寻找高性能架构:首先,通过梯度下降对DARTS超级网络(由混合操作组成)进行优化。其次,通过选择对超网络贡献最大的操作,构建最终架构。尽管DARTS提高了NAS的效率,但它仍然受到众所周知的不稳定问题,可能导致架构恶化。现有的工作主要将不稳定性问题归因于其超网络优化失败,而对其选择方法却关注不足。在本文中,我们放弃了通常使用的基于大小的选择方法,并提出了一个基于操作强度的新标准,该标准通过操作对最终损失的影响来估计操作的重要性。我们证明了,在没有对超网络优化进行修改的情况下使用所提出的标准可以有效解决不稳定性问题,表明基于大小的选择方法可能是DARTS不稳定的关键原因。在NAS-Bench-201和DARTS搜索空间上的实验表明,我们的方法的有效性得到了验证。
https://arxiv.org/abs/2409.14433
Eye movement biometrics has received increasing attention thanks to its high secure identification. Although deep learning (DL) models have been recently successfully applied for eye movement recognition, the DL architecture still is determined by human prior knowledge. Differentiable Neural Architecture Search (DARTS) automates the manual process of architecture design with high search efficiency. DARTS, however, usually stacks the same multiple learned cells to form a final neural network for evaluation, limiting therefore the diversity of the network. Incidentally, DARTS usually searches the architecture in a shallow network while evaluating it in a deeper one, which results in a large gap between the architecture depths in the search and evaluation scenarios. To address this issue, we propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition. First, we define a supernet and propose a global and local alternate Neural Architecture Search method to search the optimal architecture alternately with an differentiable neural architecture search. The local search strategy aims to find an optimal architecture for different cells while the global search strategy is responsible for optimizing the architecture of the target network. To further reduce redundancy, a transfer entropy is proposed to compute the information amount of each layer, so as to further simplify search network. Our experiments on three public databases demonstrate that the proposed EM-DARTS is capable of producing an optimal architecture that leads to state-of-the-art recognition performance.
眼动生物识别因其高安全识别而得到了越来越多的关注。虽然最近深度学习(DL)模型已经在眼动识别方面取得了成功,但DL架构仍然是由人类先验知识决定的。不同可导神经网络架构搜索(DARTS)自动通过高搜索效率的手动过程设计神经网络架构。然而,DARTS通常将相同的多层学习细胞堆叠在一起形成一个最终的神经网络,从而限制了网络的多样性。意外的是,DARTS通常在浅层网络上评估架构,而在深层网络上评估架构,这导致在搜索和评估场景中架构深度之间的差距较大。为了解决这个问题,我们提出了EM-DARTS,一种自适应的神经网络架构搜索算法,用于自动设计眼动识别的DL架构。首先,我们定义了一个超网络,并提出了一个全局和局部交替神经网络架构搜索方法,以交替使用不同可导神经网络架构搜索来寻找最优架构。局部搜索策略旨在找到不同细胞的最佳架构,而全局搜索策略负责优化目标网络的架构。为了进一步减少冗余,我们提出了传输熵来计算每个层的信息量,从而进一步简化搜索网络。我们对三个公开数据库的实验证明表明,EM-DARTS能够产生最优架构,从而实现最先进的识别性能。
https://arxiv.org/abs/2409.14432
Eye movement biometrics is a secure and innovative identification method. Deep learning methods have shown good performance, but their network architecture relies on manual design and combined priori knowledge. To address these issues, we introduce automated network search (NAS) algorithms to the field of eye movement recognition and present Relax DARTS, which is an improvement of the Differentiable Architecture Search (DARTS) to realize more efficient network search and training. The key idea is to circumvent the issue of weight sharing by independently training the architecture parameters $\alpha$ to achieve a more precise target architecture. Moreover, the introduction of module input weights $\beta$ allows cells the flexibility to select inputs, to alleviate the overfitting phenomenon and improve the model performance. Results on four public databases demonstrate that the Relax DARTS achieves state-of-the-art recognition performance. Notably, Relax DARTS exhibits adaptability to other multi-feature temporal classification tasks.
眼动生物识别是一种安全和创新的身份识别方法。深度学习方法显示出良好的性能,但它们的网络架构依赖于手动设计,并且结合了先验知识。为解决这些问题,我们将自动网络搜索(NAS)算法引入眼动识别领域,并提出了Relax DARTS,这是基于Differentiable Architecture Search(DARTS)的改进,以实现更有效的网络搜索和训练。关键想法是独立训练架构参数$\alpha$,以绕过权重共享的问题,实现更精确的目标架构。此外,引入模块输入权重$\beta$,使得细胞具有选择输入的灵活性,减轻过拟合现象,提高模型性能。在四个公开数据库上的结果表明,Relax DARTS实现了最先进的识别性能。值得注意的是,Relax DARTS表现出对其他多特征时序分类任务的适应性。
https://arxiv.org/abs/2409.11652
This study presents a novel method for improving rice disease classification using 8 different convolutional neural network (CNN) algorithms, which will further the field of precision agriculture. Tkinter-based application that offers farmers a feature-rich interface. With the help of this cutting-edge application, farmers will be able to make timely and well-informed decisions by enabling real-time disease prediction and providing personalized recommendations. Together with the user-friendly Tkinter interface, the smooth integration of cutting-edge CNN transfer learning algorithms-based technology that include ResNet-50, InceptionV3, VGG16, and MobileNetv2 with the UCI dataset represents a major advancement toward modernizing agricultural practices and guaranteeing sustainable crop management. Remarkable outcomes include 75% accuracy for ResNet-50, 90% accuracy for DenseNet121, 84% accuracy for VGG16, 95.83% accuracy for MobileNetV2, 91.61% accuracy for DenseNet169, and 86% accuracy for InceptionV3. These results give a concise summary of the models' capabilities, assisting researchers in choosing appropriate strategies for precise and successful rice crop disease identification. A severe overfitting has been seen on VGG19 with 70% accuracy and Nasnet with 80.02% accuracy. On Renset101, only an accuracy of 54% could be achieved, along with only 33% on efficientNetB0. A MobileNetV2-trained model was successfully deployed on a TKinter GUI application to make predictions using image or real-time video capture.
这项研究提出了一种改进利用8种卷积神经网络(CNN)算法提高水稻病害分类的新方法,这将进一步推动精确农业领域的发展。基于Tkinter的应用程序为农民提供了一个功能丰富的界面。通过这个先进的应用程序,农民可以通过实时疾病预测和提供个性化的建议来做出及时和明智的决策。与用户友好的Tkinter界面相结合,将先进的CNN迁移学习算法与包括ResNet-50、InceptionV3、VGG16和MobileNetv2的UCI数据集的平滑集成代表了迈向现代农业实践和确保可持续作物管理的重大进展。令人印象深刻的结果包括:ResNet-50的准确率为75%,DenseNet121的准确率为90%,VGG16的准确率为84%,MobileNetV2的准确率为95.83%,DenseNet169的准确率为91.61%,InceptionV3的准确率为86%。这些结果简要地总结了模型的能力,帮助研究人员选择适当的策略进行精确和成功的水稻病害识别。在VGG19上,70%的准确率显示了严重的过拟合。在Nasnet上,80.02%的准确率也无法实现。在Renset101上,只有54%的准确率,效率NetB0也只有33%的准确率。使用经过 MobileNetV2 训练的模型,在 Tkinter GUI 应用程序中成功部署,用于通过图像或实时视频捕获进行预测。
https://arxiv.org/abs/2410.01827
In this paper, we jointly combine image classification and image denoising, aiming to enhance human perception of noisy images captured by edge devices, like low-light security cameras. In such settings, it is important to retain the ability of humans to verify the automatic classification decision and thus jointly denoise the image to enhance human perception. Since edge devices have little computational power, we explicitly optimize for efficiency by proposing a novel architecture that integrates the two tasks. Additionally, we alter a Neural Architecture Search (NAS) method, which searches for classifiers to search for the integrated model while optimizing for a target latency, classification accuracy, and denoising performance. The NAS architectures outperform our manually designed alternatives in both denoising and classification, offering a significant improvement to human perception. Our approach empowers users to construct architectures tailored to domains like medical imaging, surveillance systems, and industrial inspections.
在本文中,我们共同结合图像分类和图像去噪,旨在提高边缘设备(如低光安全摄像头)捕获到的噪声图像中人类的感知。在这种设置中,保留人类验证自动分类决策的能力非常重要,从而共同去噪以提高人类的感知。由于边缘设备具有较少的计算能力,我们通过提出一种新颖的架构,将两种任务整合在一起,明显优化了效率。此外,我们改变了一个神经架构搜索(NAS)方法,该方法在优化目标延迟、分类准确性和去噪性能的同时,寻找分类器。与我们的自定义选项相比,NAS架构在去噪和分类方面都表现出色,显著提高了人类的感知。我们的方法使用户能够构建适应领域,如医学成像、监控系统和工业检查的架构。
https://arxiv.org/abs/2409.08943
The selection of algorithms is a crucial step in designing AI services for real-world time series classification use cases. Traditional methods such as neural architecture search, automated machine learning, combined algorithm selection, and hyperparameter optimizations are effective but require considerable computational resources and necessitate access to all data points to run their optimizations. In this work, we introduce a novel data fingerprint that describes any time series classification dataset in a privacy-preserving manner and provides insight into the algorithm selection problem without requiring training on the (unseen) dataset. By decomposing the multi-target regression problem, only our data fingerprints are used to estimate algorithm performance and uncertainty in a scalable and adaptable manner. Our approach is evaluated on the 112 University of California riverside benchmark datasets, demonstrating its effectiveness in predicting the performance of 35 state-of-the-art algorithms and providing valuable insights for effective algorithm selection in time series classification service systems, improving a naive baseline by 7.32% on average in estimating the mean performance and 15.81% in estimating the uncertainty.
算法选择是设计真实时间序列分类应用所需AI服务的关键步骤。传统方法(如神经网络架构搜索、自动机器学习、联合算法选择和超参数优化)虽然有效,但需要大量的计算资源,并且需要访问所有数据点来运行优化。在本文中,我们引入了一种新的数据指纹,以隐私保护的方式描述任何时间序列分类数据集,并提供了在不需要训练(未见)数据集的情况下解决算法选择问题的洞察。通过分解多目标回归问题,仅使用我们的数据指纹来估计算法的性能和不确定性,以规模和可扩展性为前提。我们对112个加州大学河岸大学基准数据集进行了评估,证明了其在预测35个最先进算法性能方面的高效性,并为时间序列分类服务系统有效算法选择提供了有价值的见解,将 naive 基线估计提高了7.32%。在估计均性能和不确定性方面,平均提高了15.81%。
https://arxiv.org/abs/2409.08636
With the rise of deep learning technology in practical applications, Convolutional Neural Networks (CNNs) have been able to assist humans in solving many real-world problems. To enhance the performance of CNNs, numerous network architectures have been explored. Some of these architectures are designed based on the accumulated experience of researchers over time, while others are designed through neural architecture search methods. The improvements made to CNNs by the aforementioned methods are quite significant, but most of the improvement methods are limited in reality by model size and environmental constraints, making it difficult to fully realize the improved performance. In recent years, research has found that many CNN structures can be explained by the discretization of ordinary differential equations. This implies that we can design theoretically supported deep network structures using higher-order numerical difference methods. It should be noted that most of the previous CNN model structures are based on low-order numerical methods. Therefore, considering that the accuracy of linear multi-step numerical difference methods is higher than that of the forward Euler method, this paper proposes a stacking scheme based on the linear multi-step method. This scheme enhances the performance of ResNet without increasing the model size and compares it with the Runge-Kutta scheme. The experimental results show that the performance of the stacking scheme proposed in this paper is superior to existing stacking schemes (ResNet and HO-ResNet), and it has the capability to be extended to other types of neural networks.
随着深度学习技术在实际应用中的崛起,卷积神经网络(CNN)已经能够帮助人类解决许多实际问题。为了提高CNN的性能,已经探索了许多网络架构。这些架构有的是基于研究人员在长时间内积累的经验,有的是通过神经架构搜索方法设计的。上述方法对CNN的改进效果相当显著,但大多数改进方法在现实中都受到模型大小和环境约束的限制,使得实现改善性能的目标变得困难。近年来,研究发现许多CNN结构可以通过一阶数值微分方程的离散化来解释。这表明我们可以使用高阶数值微分方法设计理论上有支持的深度网络结构。值得注意的是,大多数之前CNN模型结构都是基于低阶数值方法的。因此,考虑到一阶多步数值微分方法的准确度高于前向欧拉方法,本文提出了一个基于一阶多步方法的堆叠方案。这个方案在不增加模型规模的情况下提高了ResNet的性能,并将其与Runcated Kutta方案进行比较。实验结果表明,本文提出的堆叠方案在现有堆叠方案(ResNet和HO-ResNet)之上具有优越的性能,并具有扩展到其他类型神经网络的潜力。
https://arxiv.org/abs/2409.04977
Quick and reliable measurement of wood chip moisture content is an everlasting problem for numerous forest-reliant industries such as biofuel, pulp and paper, and bio-refineries. Moisture content is a critical attribute of wood chips due to its direct relationship with the final product quality. Conventional techniques for determining moisture content, such as oven-drying, possess some drawbacks in terms of their time-consuming nature, potential sample damage, and lack of real-time feasibility. Furthermore, alternative techniques, including NIR spectroscopy, electrical capacitance, X-rays, and microwaves, have demonstrated potential; nevertheless, they are still constrained by issues related to portability, precision, and the expense of the required equipment. Hence, there is a need for a moisture content determination method that is instant, portable, non-destructive, inexpensive, and precise. This study explores the use of deep learning and machine vision to predict moisture content classes from RGB images of wood chips. A large-scale image dataset comprising 1,600 RGB images of wood chips has been collected and annotated with ground truth labels, utilizing the results of the oven-drying technique. Two high-performing neural networks, MoistNetLite and MoistNetMax, have been developed leveraging Neural Architecture Search (NAS) and hyperparameter optimization. The developed models are evaluated and compared with state-of-the-art deep learning models. Results demonstrate that MoistNetLite achieves 87% accuracy with minimal computational overhead, while MoistNetMax exhibits exceptional precision with a 91% accuracy in wood chip moisture content class prediction. With improved accuracy and faster prediction speed, our proposed MoistNet models hold great promise for the wood chip processing industry.
快速和可靠的木屑水分含量测量是一个永恒的问题,对诸如生物燃料、纸浆和纸张、生物精炼等依赖森林的行业具有广泛的应用价值。由于水分含量与最终产品质量的关系密切,因此木屑水分含量是木屑的一个关键属性。传统的确定水分含量的方法,如烘烤法,在时间消耗、可能的样品损伤和缺乏实时可行性方面存在一些缺陷。此外,包括NIR光谱、电容法、X光和微波等技术也表现出潜力,但它们仍然受到便携性、精度和所需设备费用等问题的限制。因此,有必要开发一种快速、便携、非破坏性、低成本和精确的木材水分含量测定方法。 本研究探讨了利用深度学习和机器视觉从木屑的RGB图像中预测水分含量类别的应用。已经收集并标注了1600张木屑的RGB图像的大规模图像数据集,并利用烘烤法的结果对其进行了标注。利用神经架构搜索(NAS)和超参数优化,开发了两个高性能的神经网络:MoistNetLite和MoistNetMax。对所开发模型进行了评估并将其与最先进的深度学习模型进行了比较。结果表明,MoistNetLite达到87%的准确率,而MoistNetMax在木屑水分含量分类预测中表现出91%的精度。通过提高准确性和预测速度,我们的MoistNet模型在木屑加工行业具有巨大的应用潜力。
https://arxiv.org/abs/2409.04920
The significant computational cost of multiplications hinders the deployment of deep neural networks (DNNs) on edge devices. While multiplication-free models offer enhanced hardware efficiency, they typically sacrifice accuracy. As a solution, multiplication-reduced hybrid models have emerged to combine the benefits of both approaches. Particularly, prior works, i.e., NASA and NASA-F, leverage Neural Architecture Search (NAS) to construct such hybrid models, enhancing hardware efficiency while maintaining accuracy. However, they either entail costly retraining or encounter gradient conflicts, limiting both search efficiency and accuracy. Additionally, they overlook the acceleration opportunity introduced by accelerator search, yielding sub-optimal hardware performance. To overcome these limitations, we propose NASH, a Neural architecture and Accelerator Search framework for multiplication-reduced Hybrid models. Specifically, as for NAS, we propose a tailored zero-shot metric to pre-identify promising hybrid models before training, enhancing search efficiency while alleviating gradient conflicts. Regarding accelerator search, we innovatively introduce coarse-to-fine search to streamline the search process. Furthermore, we seamlessly integrate these two levels of searches to unveil NASH, obtaining the optimal model and accelerator pairing. Experiments validate our effectiveness, e.g., when compared with the state-of-the-art multiplication-based system, we can achieve $\uparrow$$2.14\times$ throughput and $\uparrow$$2.01\times$ FPS with $\uparrow$$0.25\%$ accuracy on CIFAR-100, and $\uparrow$$1.40\times$ throughput and $\uparrow$$1.19\times$ FPS with $\uparrow$$0.56\%$ accuracy on Tiny-ImageNet. Codes are available at \url{this https URL.}
乘法运算的显著计算成本阻碍了在边缘设备上部署深度神经网络(DNNs)。虽然具有免费乘法的模型提供了更高的硬件效率,但它们通常会牺牲精度。为了解决这个问题,我们提出了 multiplication-reduced hybrid 模型,结合了两种方法的优点。特别地,先前的作品,即 NASA 和 NASA-F,利用神经架构搜索(NAS)构建了这样的混合模型,在保持准确性的同时提高了硬件效率。然而,他们要么导致昂贵的重新训练,要么遇到梯度冲突,限制了搜索效率和准确性的平衡。此外,他们忽视了加速器搜索带来的加速机会,从而导致了硬件性能的低劣。为了克服这些限制,我们提出了 NASH,一个用于 multiplication-reduced hybrid 模型的神经架构和加速器搜索框架。具体来说,对于 NAS,我们提出了一个针对零样本的定制化指标,以在训练之前预先识别出有前景的混合模型,从而提高搜索效率并减轻梯度冲突。关于加速器搜索,我们创新性地引入了粗到细的搜索,以简化搜索过程。此外,我们将这两个层次的搜索无缝集成,揭示了 NASH,获得了最优的模型和加速器匹配。实验验证了我们的有效性,例如,与最先进的基于乘法的系统相比,我们可以在 CIFAR-100 上实现 $\uparrow$$2.14\times$ 的吞吐量,在 Tiny-ImageNet 上实现 $\uparrow$$2.01\times$ 的帧率,同时具有 $\uparrow$$0.25\%$ 的准确率。代码可在此处访问:https://this URL。
https://arxiv.org/abs/2409.04829