Network Architecture Search and specifically Regularized Evolution is a common way to refine the structure of a deep learning model.However, little is known about how models empirically evolve over time which has design implications for designing caching policies, refining the search algorithm for particular applications, and other important use this http URL this work, we algorithmically analyze and quantitatively characterize the patterns of model evolution for a set of models from the Candle project and the Nasbench-201 search space.We show how the evolution of the model structure is influenced by the regularized evolution algorithm. We describe how evolutionary patterns appear in distributed settings and opportunities for caching and improved scheduling. Lastly, we describe the conditions that affect when particular model architectures rise and fall in popularity based on their frequency of acting as a donor in a sliding window.
网络架构搜索(特别 RegularizedEvolution 是一种常用的方法,以优化深度学习模型的结构)是一种特殊的方法,用于改进模型的结构。然而,对于模型如何随着时间的推移而经验进化,知之甚少,这在设计缓存策略、优化特定应用的搜索算法以及其他重要方面都有设计影响。在本文中,我们算法ically 分析并定量 characterized 了从 candle 项目和 Nasbench-201 搜索空间中选取一组模型的模型进化模式。我们展示了模型结构如何受到 RegularizedEvolution 算法的影响。我们描述了如何在分布式环境中出现进化模式,以及缓存和改进调度的机会。最后,我们描述了特定模型架构的流行度如何基于它们在滑动窗口中作为捐赠者的频率而变化的条件。
https://arxiv.org/abs/2309.12576
Hyperspectral Imaging (HSI) serves as a non-destructive spatial spectroscopy technique with a multitude of potential applications. However, a recurring challenge lies in the limited size of the target datasets, impeding exhaustive architecture search. Consequently, when venturing into novel applications, reliance on established methodologies becomes commonplace, in the hope that they exhibit favorable generalization characteristics. Regrettably, this optimism is often unfounded due to the fine-tuned nature of models tailored to specific HSI contexts. To address this predicament, this study introduces an innovative benchmark dataset encompassing three markedly distinct HSI applications: food inspection, remote sensing, and recycling. This comprehensive dataset affords a finer assessment of hyperspectral model capabilities. Moreover, this benchmark facilitates an incisive examination of prevailing state-of-the-art techniques, consequently fostering the evolution of superior methodologies. Furthermore, the enhanced diversity inherent in the benchmark dataset underpins the establishment of a pretraining pipeline for HSI. This pretraining regimen serves to enhance the stability of training processes for larger models. Additionally, a procedural framework is delineated, offering insights into the handling of applications afflicted by limited target dataset sizes.
超光谱成像(HSI)是一种具有多个潜在应用的非破坏性空间光谱技术。然而,经常面临的挑战是目标数据集有限的大小,限制了 exhaustive 结构搜索。因此,当尝试新应用时,依赖已有的方法变得很常见,希望它们表现出有利的泛化特征。然而,这种乐观往往没有基于原因,因为针对特定HSI上下文定制的模型微调性质。为了解决这一困境,本研究介绍了一个创新性基准数据集,包括三个显著不同的HSI应用:食品检查、遥感和回收。这个全面的数据集提供了对超光谱模型能力更精细评估的机会。此外,这个基准促进了对流行的先进技术的深刻审视,因此促进了更好的方法的发展。此外,基准数据集固有的增强多样性支持了HSI的前训练管道的建立。这个前训练方案旨在增强大型模型的训练过程的稳定性。此外,还定义了一个程序框架,提供了处理受到目标数据集大小限制的应用 insights。
https://arxiv.org/abs/2309.11122
Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model.
近年来,见证了生成对抗网络(GAN)在图像到图像翻译方面的主要进展。然而,这些GAN模型的成功取决于计算成本高且劳动密集型的训练数据。目前高效的GAN学习技术常常同时涉及两个对立方面:第一,通过减少计算成本来减小模型规模;第二,通过更少的训练数据和标签进行数据/标签高效的学习。为了将两种世界的优点结合起来,我们提出了一种新的学习范式——统一GAN压缩(UGC),它有一个统一的优化目标,以无缝地促进模型高效和标签高效的学习协同作用。UGC通过半监督驱动的网络架构搜索和自适应在线半监督聚类 stages逐步构建,形成了一种具有灵活架构、高效标签学习且表现优异的异质双向学习方案。
https://arxiv.org/abs/2309.09310
The recent surge of interest surrounding Multimodal Neural Networks (MM-NN) is attributed to their ability to effectively process and integrate information from diverse data sources. In MM-NN, features are extracted and fused from multiple modalities using adequate unimodal backbones and specific fusion networks. Although this helps strengthen the multimodal information representation, designing such networks is labor-intensive. It requires tuning the architectural parameters of the unimodal backbones, choosing the fusing point, and selecting the operations for fusion. Furthermore, multimodality AI is emerging as a cutting-edge option in Internet of Things (IoT) systems where inference latency and energy consumption are critical metrics in addition to accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices. Harmonic-NAS involves a two-tier optimization approach for the unimodal backbone architectures and fusion strategy and operators. By incorporating the hardware dimension into the optimization, evaluation results on various devices and multimodal datasets have demonstrated the superiority of Harmonic-NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.
对多模态神经网络(MM-NN)近期的浓厚兴趣可能是由于它们能够有效地处理和整合来自多种数据源的信息。在MM-NN中,多个模态的特征通过适当的单模态主干线和特定的融合网络进行提取和融合。尽管这有助于加强多模态信息表示,设计这样的网络是非常艰苦的。它需要调整单模态主干线的架构参数,选择融合点,并选择融合操作。此外,多模态AI在物联网系统(IoT)中正在成为一项先进的选择,Inference延迟和能源消耗是除了准确性之外的重要指标。在本文中,我们提出了 harmonic-NAS,一个框架,用于优化单模态主干线和多模态融合网络,并在资源受限的设备上具有硬件意识。 harmonic-NAS涉及两个层次的优化方法,用于单模态主干架构和融合策略和操作。通过将硬件维度融入优化,对多种设备和多模态数据集的评价结果证明了 harmonic-NAS相对于现有方法的优越性,实现了10.9%的准确性改进、1.91x的延迟减少和2.14x的能源效率提升。
https://arxiv.org/abs/2309.06612
The complex and unique neural network topology of the human brain formed through natural evolution enables it to perform multiple cognitive functions simultaneously. Automated evolutionary mechanisms of biological network structure inspire us to explore efficient architectural optimization for Spiking Neural Networks (SNNs). Instead of manually designed fixed architectures or hierarchical Network Architecture Search (NAS), this paper evolves SNNs architecture by incorporating brain-inspired local modular structure and global cross-module connectivity. Locally, the brain region-inspired module consists of multiple neural motifs with excitatory and inhibitory connections; Globally, we evolve free connections among modules, including long-term cross-module feedforward and feedback connections. We further introduce an efficient multi-objective evolutionary algorithm based on a few-shot performance predictor, endowing SNNs with high performance, efficiency and low energy consumption. Extensive experiments on static datasets (CIFAR10, CIFAR100) and neuromorphic datasets (CIFAR10-DVS, DVS128-Gesture) demonstrate that our proposed model boosts energy efficiency, archiving consistent and remarkable performance. This work explores brain-inspired neural architectures suitable for SNNs and also provides preliminary insights into the evolutionary mechanisms of biological neural networks in the human brain.
人类大脑通过自然进化形成复杂的独特神经网络拓扑,使其能够同时执行多种认知功能。生物网络结构自动进化机制启发我们探索对Spiking Neural Networks(SNNs)进行高效 architectural optimization。 Instead of manually设计的固定架构或层次网络架构搜索(NAS),本文采用 brain-inspired 局部模块结构和全球跨模块连接。局部区域灵感模块包含多个神经元主题,具有促进和抑制连接;全球模块间进化自由连接,包括长期跨模块正向和反向连接。我们还介绍了一种基于少量预测性能的优化多目标进化算法,为 SNNs 赋予高性能、效率和低能源消耗。在静态数据集(CIFAR10、CIFAR100)和神经形态数据集(CIFAR10-DVS、DVS128-Gesture)上进行广泛的实验,证明了我们提出的模型可以提高能源效率,存储一致性和显著的性能。这项工作探索了适合 SNNs 的大脑 inspired 神经网络架构,同时也提供了对大脑生物神经网络进化机制的初步理解。
https://arxiv.org/abs/2309.05263
Accurate medical image segmentation especially for echocardiographic images with unmissable noise requires elaborate network design. Compared with manual design, Neural Architecture Search (NAS) realizes better segmentation results due to larger search space and automatic optimization, but most of the existing methods are weak in layer-wise feature aggregation and adopt a ``strong encoder, weak decoder" structure, insufficient to handle global relationships and local details. To resolve these issues, we propose a novel semi-supervised hybrid NAS network for accurate medical image segmentation termed SSHNN. In SSHNN, we creatively use convolution operation in layer-wise feature fusion instead of normalized scalars to avoid losing details, making NAS a stronger encoder. Moreover, Transformers are introduced for the compensation of global context and U-shaped decoder is designed to efficiently connect global context with local features. Specifically, we implement a semi-supervised algorithm Mean-Teacher to overcome the limited volume problem of labeled medical image dataset. Extensive experiments on CAMUS echocardiography dataset demonstrate that SSHNN outperforms state-of-the-art approaches and realizes accurate segmentation. Code will be made publicly available.
准确的医学图像分割特别是对于可见噪声的心电图图像需要复杂的网络设计。与手动设计相比,神经网络架构搜索(NAS)可以实现更好的分割结果,因为它们具有更大的搜索空间和自动优化。但是,大多数现有方法在层特征聚合方面都很弱,并且采用“强编码,弱解码”的结构,不足以处理全球关系和局部细节。为了解决这些问题,我们提出了一种新的半监督混合NAS网络,称为SSHNN。在SSHNN中,我们采用创意的方法在层特征融合中应用卷积操作,而不是标准化 scalars,以避免失去细节,使NAS成为一个更强的编码器。此外,我们引入了Transformers来补偿全球上下文,并设计一个U形解码器以高效地将全球上下文与局部特征连接起来。具体来说,我们实施了半监督算法Mean-Teacher,以克服标记医学图像集的体积限制问题。在CAMUS心电图数据集上进行广泛的实验表明,SSHNN比现有方法表现更好,并实现了准确的分割。代码将公开可用。
https://arxiv.org/abs/2309.04672
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels. Despite its advancements, the field grapples with challenges, notably the reliance on manual designs for network structures and loss functions, and the constraints of utilizing simulated reference images as ground truths. Consequently, current methodologies often suffer from color distortions and exposure artifacts, further complicating the quest for authentic image representation. In addressing these challenges, this paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions. More specifically, we harnesses a unique dual research mechanism rooted in a novel weighted structure refinement architecture search. Besides, a hybrid supervised contrast constraint seamlessly guides and integrates with searching process, facilitating a more adaptive and comprehensive search for optimal loss functions. We realize the state-of-the-art performance in comparison to various competitive schemes, yielding a 10.61% and 4.38% improvement in Visual Information Fidelity (VIF) for general and no-reference scenarios, respectively, while providing results with high contrast, rich details and colors.
多曝光图像融合(MEF)已成为解决数字图像在代表不同曝光水平方面的局限性的重要解决方案。尽管取得了进展,但该领域面临着挑战,特别是依赖手动设计网络结构和损失函数,以及使用模拟参考图像作为基线 truth 的限制。因此,当前方法往往遭受色彩失真和曝光误差的影响,进一步复杂化了寻找真实图像表示的问题。为了应对这些挑战,本文提出了一种混合监督的双搜索方法,称为 HSDS-MEF,它引入了一种双水平的优化搜索方案,用于自动设计网络结构和损失函数。更具体地说,我们利用了一种独特的双重研究机制,其基础是一种新的加权结构优化架构搜索。此外,混合监督对比度约束无缝引导并集成搜索过程, facilitate a moreAdaptive和全面优化最佳损失函数的搜索。我们实现了与各种竞争方案相比的最新性能,分别提高了视觉信息失真(VIF)值,分别为一般情况和无参考场景的 10.61 % 和 4.38 %,同时提供了高对比度、丰富细节和颜色的结果。
https://arxiv.org/abs/2309.01113
This work introduces improvements to the stability and generalizability of Cyclic DARTS (CDARTS). CDARTS is a Differentiable Architecture Search (DARTS)-based approach to neural architecture search (NAS) that uses a cyclic feedback mechanism to train search and evaluation networks concurrently. This training protocol aims to optimize the search process by enforcing that the search and evaluation networks produce similar outputs. However, CDARTS introduces a loss function for the evaluation network that is dependent on the search network. The dissimilarity between the loss functions used by the evaluation networks during the search and retraining phases results in a search-phase evaluation network that is a sub-optimal proxy for the final evaluation network that is utilized during retraining. We present ICDARTS, a revised approach that eliminates the dependency of the evaluation network weights upon those of the search network, along with a modified process for discretizing the search network's \textit{zero} operations that allows these operations to be retained in the final evaluation networks. We pair the results of these changes with ablation studies on ICDARTS' algorithm and network template. Finally, we explore methods for expanding the search space of ICDARTS by expanding its operation set and exploring alternate methods for discretizing its continuous search cells. These experiments resulted in networks with improved generalizability and the implementation of a novel method for incorporating a dynamic search space into ICDARTS.
这项工作提出了改进循环 DARTS(CDARTS) 的稳定性和泛化性的措施。CDARTS 是基于自适应结构搜索(DARTS)的方法,用于神经网络结构搜索(NAS),它使用循环反馈机制同时训练搜索和评估网络。这种训练协议旨在优化搜索过程,强制搜索和评估网络产生类似输出。然而,CDARTS 引入了评估网络的 loss 函数,该函数依赖于搜索网络。在搜索和重新训练阶段,评估网络使用的不同 loss 函数之间的差异导致搜索阶段评估网络成为最终评估网络在重新训练期间使用的次优代理。我们提出了ICDARTS,一种修订的方法,消除了评估网络权重依赖于搜索网络的因素,并修改了方法,以离散化搜索网络的 \textit{零} 操作,这允许这些操作保留在最终评估网络中。我们将这些变化的结果与ICDARTS 算法和网络模板的 ablation 研究配对。最后,我们探索了扩展ICDARTS搜索空间的方法,通过扩展其操作集,并探索了离散化其连续搜索细胞的替代方法。这些实验导致网络的泛化性和改进。
https://arxiv.org/abs/2309.00664
In prediction-based Neural Architecture Search (NAS), performance indicators derived from graph convolutional networks have shown significant success. These indicators, achieved by representing feed-forward structures as component graphs through one-hot encoding, face a limitation: their inability to evaluate architecture performance across varying search spaces. In contrast, handcrafted performance indicators (zero-shot NAS), which use the same architecture with random initialization, can generalize across multiple search spaces. Addressing this limitation, we propose a novel approach for zero-shot NAS using deep learning. Our method employs Fourier sum of sines encoding for convolutional kernels, enabling the construction of a computational feed-forward graph with a structure similar to the architecture under evaluation. These encodings are learnable and offer a comprehensive view of the architecture's topological information. An accompanying multi-layer perceptron (MLP) then ranks these architectures based on their encodings. Experimental results show that our approach surpasses previous methods using graph convolutional networks in terms of correlation on the NAS-Bench-201 dataset and exhibits a higher convergence rate. Moreover, our extracted feature representation trained on each NAS-Benchmark is transferable to other NAS-Benchmarks, showing promising generalizability across multiple search spaces. The code is available at: this https URL
在基于预测的神经网络架构搜索(NAS)中,从图形卷积网络中得出的性能指标已经取得了显著成功。这些指标通过将输出结构表示为组件图形,通过一hot编码实现,面临一个限制:无法在不同的搜索空间中评估架构的性能。相比之下,使用手工构建的性能指标(零次NAS),这些指标使用相同的架构,以随机初始化,可以泛化到多个搜索空间。为了解决这个限制,我们提出了一种使用深度学习的零次NAS新方法。我们使用傅里叶余弦编码作为卷积核的编码方式,能够构建一个计算输出输出图,其结构与被评估架构相似。这些编码是可学习的,提供了架构拓扑信息的全面视图。伴随的多层感知器(MLP)根据它们的编码进行排名。实验结果显示,我们在NAS-bench-201数据集上与使用图形卷积网络的方法相比,在相关性方面超越了这些方法,并表现出更高的收敛速度。此外,我们提取的特征表示可以从每个NAS-bench基准迁移到其他NAS-bench基准,表现出在多个搜索空间上的 promising 通用性。代码可在: this https URL 获取。
https://arxiv.org/abs/2308.16775
Graph neural networks (GNNs) are powerful tools for performing data science tasks in various domains. Although we use GNNs in wide application scenarios, it is a laborious task for researchers and practitioners to design/select optimal GNN rchitectures in diverse graphs. To save human efforts and computational costs, graph neural architecture search (Graph NAS) has been used to search for a sub-optimal GNN architecture that combines existing components. However, there are no existing Graph NAS methods that satisfy explainability, efficiency, and adaptability to various graphs. Therefore, we propose an efficient and explainable Graph NAS method, called ExGNAS, which consists of (i) a simple search space that can adapt to various graphs and (ii) a search algorithm that makes the decision process explainable. The search space includes only fundamental functions that can handle homophilic and heterophilic graphs. The search algorithm efficiently searches for the best GNN architecture via Monte-Carlo tree search without neural models. The combination of our search space and algorithm achieves finding accurate GNN models and the important functions within the search space. We comprehensively evaluate our method compared with twelve hand-crafted GNN architectures and three Graph NAS methods in four graphs. Our experimental results show that ExGNAS increases AUC up to 3.6 and reduces run time up to 78\% compared with the state-of-the-art Graph NAS methods. Furthermore, we show ExGNAS is effective in analyzing the difference between GNN architectures in homophilic and heterophilic graphs.
Graph neural networks (GNNs) 是在各种领域中执行数据分析任务的强有力的工具。尽管我们广泛应用 GNNs,但研究人员和从业者需要设计和选择各种不同 graphs 的最佳 GNN 架构。为了节省人力和计算成本,使用 graph neural architecture search (Graph NAS) 搜索 sub-optimal GNN 架构来组合现有组件。但是,目前还没有满足解释性、效率和适应各种 graph 的 Graph NAS 方法。因此,我们提出了一种高效且可解释的 Graph NAS 方法,称为 ExGNAS,它包括 (i) 适应各种 graph 的简单搜索空间,(ii) 使决策过程可解释的搜索算法。搜索空间仅包含处理同向和异向 graph 的基本功能。搜索算法通过蒙特卡洛树搜索高效搜索最佳 GNN 架构,而不需要神经网络模型。我们的搜索空间和算法的组合实现了在搜索空间内找到准确的 GNN 模型和重要功能的全面评估。我们与其他方法进行了比较,在四个 graph 中,我们的实验结果表明,ExGNAS AUC 增加了 3.6 倍,运行时间减少了 78%。此外,我们表明,ExGNAS 在同向和异向 graph 的 GNN 架构比较中有效。
https://arxiv.org/abs/2308.15734
One-Shot Neural Architecture Search (NAS) algorithms often rely on training a hardware agnostic super-network for a domain specific task. Optimal sub-networks are then extracted from the trained super-network for different hardware platforms. However, training super-networks from scratch can be extremely time consuming and compute intensive especially for large models that rely on a two-stage training process of pre-training and fine-tuning. State of the art pre-trained models are available for a wide range of tasks, but their large sizes significantly limits their applicability on various hardware platforms. We propose InstaTune, a method that leverages off-the-shelf pre-trained weights for large models and generates a super-network during the fine-tuning stage. InstaTune has multiple benefits. Firstly, since the process happens during fine-tuning, it minimizes the overall time and compute resources required for NAS. Secondly, the sub-networks extracted are optimized for the target task, unlike prior work that optimizes on the pre-training objective. Finally, InstaTune is easy to "plug and play" in existing frameworks. By using multi-objective evolutionary search algorithms along with lightly trained predictors, we find Pareto-optimal sub-networks that outperform their respective baselines across different performance objectives such as accuracy and MACs. Specifically, we demonstrate that our approach performs well across both unimodal (ViT and BERT) and multi-modal (BEiT-3) transformer based architectures.
一次性神经网络架构搜索(NAS)算法通常依赖于训练一个硬件无关的巨型网络来完成特定的任务域。然后,从训练好的巨型网络中取出最优子网络以不同的硬件平台进行训练。然而,从头开始训练巨型网络是非常耗时且计算密集型的,特别是对于依赖于预训练和微调两阶段训练的大型模型。现有预训练模型适用于多种任务,但它们的巨大尺寸极大地限制了它们在各种硬件平台上的适用性。我们提出了Insta Tune,一种利用现有预训练权重来利用大型模型进行微调的方法,并在微调阶段生成巨型网络。Insta Tune有多个优点。首先,由于该过程发生在微调阶段,它最小化了NAS所需的整体时间和计算资源。其次,取出的子网络是针对目标任务的优化的,与之前工作优化的是预训练目标不同的。最后,Insta Tune很容易地集成到现有的框架中。使用多目标进化搜索算法并与 lightly 训练的预测器一起使用,我们找到帕雷托最优子网络,它们在准确性和MAC等不同的性能目标上超越了各自的基准线。具体来说,我们证明了我们的方法在 unimodal(ViT和BERT)和多modal(BEiT-3)Transformer based 架构中表现良好。
https://arxiv.org/abs/2308.15609
Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through data preprocessing, the use of specific loss functions, and training frameworks. Yet, investigations into calibration properties have been somewhat overlooked. Our study leverages the Neural Architecture Search (NAS) search space, offering an exhaustive model architecture space for thorough calibration properties exploration. We specifically create a model calibration dataset. This dataset evaluates 90 bin-based and 12 additional calibration measurements across 117,702 unique neural networks within the widely employed NATS-Bench search space. Our analysis aims to answer several longstanding questions in the field, using our proposed dataset: (i) Can model calibration be generalized across different tasks? (ii) Can robustness be used as a calibration measurement? (iii) How reliable are calibration metrics? (iv) Does a post-hoc calibration method affect all models uniformly? (v) How does calibration interact with accuracy? (vi) What is the impact of bin size on calibration measurement? (vii) Which architectural designs are beneficial for calibration? Additionally, our study bridges an existing gap by exploring calibration within NAS. By providing this dataset, we enable further research into NAS calibration. As far as we are aware, our research represents the first large-scale investigation into calibration properties and the premier study of calibration issues within NAS.
深度神经网络在各种机器学习任务中越来越受欢迎。然而,这些模型的复杂性不断增加,尽管提高了预测精度,但仍然面临校准问题。许多研究试图通过数据预处理、使用特定的损失函数和训练框架来提高校准性能。然而,对校准属性的研究工作似乎有点被忽视。我们的研究利用神经网络架构搜索(NAS)搜索空间,提供了全面模型架构空间,以进行充分的校准属性探索。我们特别创建了一个模型校准数据集。这个数据集评估了在广泛使用的NATSbench搜索空间内,使用90个二进制分类器和12个额外的校准测量的117,702个独特神经网络。我们的分析旨在使用我们的建议数据集回答一些长期存在的问题:(i) 模型校准是否可以适用于不同的任务?(ii) 稳健性是否可以用作校准测量?(iii) 校准指标的可靠性如何?(iv) 是否有 post-hoc 校准方法对所有模型均产生影响?(v) 校准与精度如何相互作用?(vi) 二进制分类器的大小对校准测量的影响是什么?(vii) 哪些建筑设计方案对校准有益?此外,我们的研究通过探索NAS内的校准问题,克服了现有之间的差距。通过提供这个数据集,我们能够进一步研究NAS校准。据我们所知,我们的研究代表了校准属性方面的第一项大规模研究,以及在NAS内的校准问题的主要研究。
https://arxiv.org/abs/2308.11838
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years. However, the architectural and computational complexity of such network architectures have made them challenging to deploy in real-world applications with high-throughput, low-memory requirements. As such, there has been significant research recently on the design of efficient vision transformer architectures. In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search (GAS) to achieve a strong balance between accuracy and architectural and computational efficiency. Through this generative architecture search process, we create TurboViT, a highly efficient hierarchical vision transformer architecture design that is generated around mask unit attention and Q-pooling design patterns. The resulting TurboViT architecture design achieves significantly lower architectural computational complexity (>2.47$\times$ smaller than FasterViT-0 while achieving same accuracy) and computational complexity (>3.4$\times$ fewer FLOPs and 0.9% higher accuracy than MobileViT2-2.0) when compared to 10 other state-of-the-art efficient vision transformer network architecture designs within a similar range of accuracy on the ImageNet-1K dataset. Furthermore, TurboViT demonstrated strong inference latency and throughput in both low-latency and batch processing scenarios (>3.21$\times$ lower latency and >3.18$\times$ higher throughput compared to FasterViT-0 for low-latency scenario). These promising results demonstrate the efficacy of leveraging generative architecture search for generating efficient transformer architecture designs for high-throughput scenarios.
视觉转换器在处理各种视觉感知任务方面表现出前所未有的性能。然而,这些网络架构的结构和计算复杂度使其难以在具有高吞吐量低内存要求的实际应用中部署。因此,最近有大量研究集中在设计高效的视觉转换器架构。在本研究中,我们通过生成结构搜索(GAS)方法来探索快速视觉转换器架构的设计。通过这个过程,我们创造了TurboViT,这是一种高度高效的分层视觉转换器架构设计,其生成基于口罩单元注意力和Q-池设计模式。 resultingTurboViT架构设计在 architectural和计算复杂度方面实现了显著降低(比FasterViT-0小2.47倍,同时具有相同的精度),并计算复杂度方面实现了(比MobileViT2-2.0少3.4 FLOPs,精度高0.9%)与在ImageNet-1K数据集上相似精度范围内的10个最先进的高效视觉转换器网络架构设计中的其他10个进行比较。此外,TurboViT在低延迟和批量处理场景下的推理延迟和吞吐量方面表现出强大的性能。(与低延迟场景下的FasterViT-0相比,TurboViT的推理延迟更低,吞吐量更高)。这些令人鼓舞的结果表明,利用生成结构搜索来为高吞吐量场景生成高效的转换器架构设计具有巨大的潜力。
https://arxiv.org/abs/2308.11421
Zero-shot learning (ZSL) aims to recognize the novel classes which cannot be collected for training a prediction model. Accordingly, generative models (e.g., generative adversarial network (GAN)) are typically used to synthesize the visual samples conditioned by the class semantic vectors and achieve remarkable progress for ZSL. However, existing GAN-based generative ZSL methods are based on hand-crafted models, which cannot adapt to various datasets/scenarios and fails to model instability. To alleviate these challenges, we propose evolutionary generative adversarial network search (termed EGANS) to automatically design the generative network with good adaptation and stability, enabling reliable visual feature sample synthesis for advancing ZSL. Specifically, we adopt cooperative dual evolution to conduct a neural architecture search for both generator and discriminator under a unified evolutionary adversarial framework. EGANS is learned by two stages: evolution generator architecture search and evolution discriminator architecture search. During the evolution generator architecture search, we adopt a many-to-one adversarial training strategy to evolutionarily search for the optimal generator. Then the optimal generator is further applied to search for the optimal discriminator in the evolution discriminator architecture search with a similar evolution search algorithm. Once the optimal generator and discriminator are searched, we entail them into various generative ZSL baselines for ZSL classification. Extensive experiments show that EGANS consistently improve existing generative ZSL methods on the standard CUB, SUN, AWA2 and FLO datasets. The significant performance gains indicate that the evolutionary neural architecture search explores a virgin field in ZSL.
零经验学习(ZSL)的目标是识别无法用于训练预测模型的新类。因此,生成模型(如生成对抗网络(GAN))通常用于生成由类语义向量约束的视觉样本,并在ZSL方面取得了显著的进展。然而,现有的基于GAN的生成ZSL方法是基于手工构建模型的,这些模型无法适应各种数据集/场景,并无法模型不稳定性。为了减轻这些挑战,我们提出了进化生成对抗网络搜索(称为EGANS)来自动设计具有良好适应性和稳定性的生成网络,从而实现可靠的视觉特征样本合成,以推进ZSL。具体来说,我们采用合作双进化来实现对生成器和判别器的神经网络架构搜索,在统一进化对抗框架下进行。EGANS通过学习两个阶段:进化生成器架构搜索和进化判别器架构搜索。在进化生成器架构搜索期间,我们采用一种多对一的对抗训练策略来进化搜索寻找最优生成器。然后,最优生成器进一步应用于进化判别器架构搜索,使用类似的进化搜索算法进行最优判别器的搜索。一旦最优生成器和判别器被搜索,我们将将其应用于ZSL分类的各种生成ZSL基准。广泛的实验表明,EGANS在标准CUB、Sun、AWA2和Flup数据集上 consistently improve existing generative ZSL methods。显著的性能提升表明,进化神经网络架构搜索在ZSL分类方面探索了全新的领域。
https://arxiv.org/abs/2308.09915
In this work, we develop a neural architecture search algorithm, termed Resbuilder, that develops ResNet architectures from scratch that achieve high accuracy at moderate computational cost. It can also be used to modify existing architectures and has the capability to remove and insert ResNet blocks, in this way searching for suitable architectures in the space of ResNet architectures. In our experiments on different image classification datasets, Resbuilder achieves close to state-of-the-art performance while saving computational cost compared to off-the-shelf ResNets. Noteworthy, we once tune the parameters on CIFAR10 which yields a suitable default choice for all other datasets. We demonstrate that this property generalizes even to industrial applications by applying our method with default parameters on a proprietary fraud detection dataset.
在本研究中,我们开发了一种神经网络架构搜索算法,称为ResBuilder,从头开发了一种ResNet架构,以实现高精度且计算成本相对较低。它还可以用于修改现有架构,并具有删除和插入ResNet块的能力,通过这种方式在ResNet架构空间中搜索合适的架构。在我们的不同图像分类数据集的实验中,ResBuilder实现接近最先进的性能,而比现有ResNet节省计算成本。值得指出的是,我们曾对CIFAR10进行调整,该调整为所有其他数据集提供了一个合适的默认选择。我们证明,这个特性甚至适用于工业应用,通过将默认参数应用于一个私有欺诈检测数据集。
https://arxiv.org/abs/2308.08504
Infrared and visible image fusion is a powerful technique that combines complementary information from different modalities for downstream semantic perception tasks. Existing learning-based methods show remarkable performance, but are suffering from the inherent vulnerability of adversarial attacks, causing a significant decrease in accuracy. In this work, a perception-aware fusion framework is proposed to promote segmentation robustness in adversarial scenes. We first conduct systematic analyses about the components of image fusion, investigating the correlation with segmentation robustness under adversarial perturbations. Based on these analyses, we propose a harmonized architecture search with a decomposition-based structure to balance standard accuracy and robustness. We also propose an adaptive learning strategy to improve the parameter robustness of image fusion, which can learn effective feature extraction under diverse adversarial perturbations. Thus, the goals of image fusion (\textit{i.e.,} extracting complementary features from source modalities and defending attack) can be realized from the perspectives of architectural and learning strategies. Extensive experimental results demonstrate that our scheme substantially enhances the robustness, with gains of 15.3% mIOU of segmentation in the adversarial scene, compared with advanced competitors. The source codes are available at this https URL.
红外和可见图像融合是一种强大的技术,可以将来自不同模式互补信息用于后续语义感知任务。现有的基于学习的方法表现出卓越的性能,但受到dversarial攻击固有的脆弱性,导致精度大幅下降。在本文中,我们提出了一种感知 aware 的 Fusion 框架,以促进dversarial 场景下的分割鲁棒性。我们首先对图像融合的组件进行系统分析,研究在dversarial 扰动下与分割鲁棒性的相关性。基于这些分析,我们提出了一种基于分解结构的一致性架构搜索,以平衡标准精度和鲁棒性。我们还提出了一种自适应学习策略,以提高图像融合参数的鲁棒性,能够在多种dversarial 扰动下学习有效的特征提取。因此,图像融合的目标(即从源模式中提取互补特征并防御攻击)可以从建筑学和学习策略的角度实现。 extensive 的实验结果表明,我们的方案显著提高了鲁棒性,在dversarial 场景下相较于先进竞争对手,分割精度有15.3%的提高。源代码在此httpsURL上可用。
https://arxiv.org/abs/2308.03979
Deep learning (DL) has been successfully applied to encrypted network traffic classification in experimental settings. However, in production use, it has been shown that a DL classifier's performance inevitably decays over time. Re-training the model on newer datasets has been shown to only partially improve its performance. Manually re-tuning the model architecture to meet the performance expectations on newer datasets is time-consuming and requires domain expertise. We propose AutoML4ETC, a novel tool to automatically design efficient and high-performing neural architectures for encrypted traffic classification. We define a novel, powerful search space tailored specifically for the near real-time classification of encrypted traffic using packet header bytes. We show that with different search strategies over our search space, AutoML4ETC generates neural architectures that outperform the state-of-the-art encrypted traffic classifiers on several datasets, including public benchmark datasets and real-world TLS and QUIC traffic collected from the Orange mobile network. In addition to being more accurate, AutoML4ETC's architectures are significantly more efficient and lighter in terms of the number of parameters. Finally, we make AutoML4ETC publicly available for future research.
深度学习(DL)已经成功应用于实验场景中加密网络流量分类。然而,在生产环境中,研究表明,DL分类器的性能不可避免地会随着时间的推移而衰减。新数据集的重新训练似乎只能部分地改善其性能。手动重新调整模型架构以满足不同新数据集的性能期望是一项费时且需要领域专业知识的任务。我们提出了AutoML4ETC,一个自动设计高效且表现优异的神经网络架构的新工具。我们定义了一个专门用于使用packet header bytes实时对加密流量进行分类的独特的强大搜索空间。我们表明,通过不同的搜索策略在我们的搜索空间中,AutoML4ETC产生的神经网络架构在多个数据集上比最先进的加密流量分类器表现更好,包括公共基准数据集和从Orange移动网络收集的真实的TLS和QUIC流量。除了更准确之外,AutoML4ETC的架构在参数数量上 significantly more efficient and light. 最后,我们公开提供了AutoML4ETC,以供未来的研究使用。
https://arxiv.org/abs/2308.02182
Most machine learning (ML) systems assume stationary and matching data distributions during training and deployment. This is often a false assumption. When ML models are deployed on real devices, data distributions often shift over time due to changes in environmental factors, sensor characteristics, and task-of-interest. While it is possible to have a human-in-the-loop to monitor for distribution shifts and engineer new architectures in response to these shifts, such a setup is not cost-effective. Instead, non-stationary automated ML (AutoML) models are needed. This paper presents the Encoder-Adaptor-Reconfigurator (EAR) framework for efficient continual learning under domain shifts. The EAR framework uses a fixed deep neural network (DNN) feature encoder and trains shallow networks on top of the encoder to handle novel data. The EAR framework is capable of 1) detecting when new data is out-of-distribution (OOD) by combining DNNs with hyperdimensional computing (HDC), 2) identifying low-parameter neural adaptors to adapt the model to the OOD data using zero-shot neural architecture search (ZS-NAS), and 3) minimizing catastrophic forgetting on previous tasks by progressively growing the neural architecture as needed and dynamically routing data through the appropriate adaptors and reconfigurators for handling domain-incremental and class-incremental continual learning. We systematically evaluate our approach on several benchmark datasets for domain adaptation and demonstrate strong performance compared to state-of-the-art algorithms for OOD detection and few-/zero-shot NAS.
大多数机器学习(ML)系统在训练和部署期间假设数据分布是静态的,且与数据分布匹配。这种情况往往是一种错误的假设。当ML模型部署到实际设备上时,数据分布往往会随着时间的变化而变化,因为环境因素、传感器特性和感兴趣的任务发生了变化。虽然可以有一个在循环中监测分布变化的人,并据此设计新架构,以响应这些变化,但这种配置不是成本效益的。相反,需要使用非静态的自动化机器学习(AutoML)模型。本文介绍了在跨域变化背景下高效不断改进的框架——编码器适配器重构(EAR)框架。 EAR框架使用固定深度神经网络(DNN)特征编码器,并在编码器顶部训练浅层网络来处理新数据。该框架能够1)通过结合DNN与高维计算(HDC)检测新数据是否超出分布(OOD),2)通过使用零次神经网络架构搜索(ZS-NAS)识别低参数神经网络适配器,以将模型适应OOD数据,3)通过逐步增加神经网络架构,根据需要动态路由数据,以处理跨域和类增量不断改进的不断改进。我们系统地评估了我们对几个基准数据集的跨域适应方法,并证明了相对于最先进的OOD检测和少量/零次NNS算法,我们的方案表现出强大的性能。
https://arxiv.org/abs/2308.02084
Deep neural networks (DNNs) are state-of-the-art techniques for solving most computer vision problems. DNNs require billions of parameters and operations to achieve state-of-the-art results. This requirement makes DNNs extremely compute, memory, and energy-hungry, and consequently difficult to deploy on small battery-powered Internet-of-Things (IoT) devices with limited computing resources. Deployment of DNNs on Internet-of-Things devices, such as traffic cameras, can improve public safety by enabling applications such as automatic accident detection and emergency response.Through this paper, we survey the recent advances in low-power and energy-efficient DNN implementations that improve the deployability of DNNs without significantly sacrificing accuracy. In general, these techniques either reduce the memory requirements, the number of arithmetic operations, or both. The techniques can be divided into three major categories: neural network compression, network architecture search and design, and compiler and graph optimizations. In this paper, we survey both low-power techniques for both convolutional and transformer DNNs, and summarize the advantages, disadvantages, and open research problems.
深度神经网络(DNN)是解决绝大多数计算机视觉问题的最先进的技术。DNN需要数十亿参数和操作才能取得最先进的结果。这要求DNN非常计算、内存和能源消耗巨大,因此很难在小型电池供电的物联网设备(IoT)上部署。将DNN部署到物联网设备,如交通监控摄像头,可以提高公共安全,使能够执行自动事故检测和紧急情况响应等应用。通过本文,我们调查了最近在低功耗和能源效率方面所取得的DNN实现进展,这些进展改进了DNN的部署能力,而不会显著降低精度。一般来说,这些技术要么降低了内存需求,要么增加了算术运算次数,要么两者都有。技术可以归为三个主要类别:神经网络压缩、网络架构搜索和设计,以及编译和图形优化。在本文中,我们研究了 convolutional 和 transformer DNN的低功耗技术,并总结了它们的优点、缺点和开放研究问题。
https://arxiv.org/abs/2308.02553
This paper presents a research study on the use of Convolutional Neural Network (CNN), ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile models to efficiently detect brain tumors in order to reduce the time required for manual review of the report and create an automated system for classifying brain tumors. An automated pipeline is proposed, which encompasses five models: CNN, ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile. The performance of the proposed architecture is evaluated on a balanced dataset and found to yield an accuracy of 99.33% for fine-tuned InceptionV3 model. Furthermore, Explainable AI approaches are incorporated to visualize the model's latent behavior in order to understand its black box behavior. To further optimize the training process, a cost-sensitive neural network approach has been proposed in order to work with imbalanced datasets which has achieved almost 4% more accuracy than the conventional models used in our experiments. The cost-sensitive InceptionV3 (CS-InceptionV3) and CNN (CS-CNN) show a promising accuracy of 92.31% and a recall value of 1.00 respectively on an imbalanced dataset. The proposed models have shown great potential in improving tumor detection accuracy and must be further developed for application in practical solutions. We have provided the datasets and made our implementations publicly available at - this https URL
本文介绍了关于使用卷积神经网络(CNN)、ResNet50、InceptionV3、EfficientNetB0和NASNetMobile模型高效检测脑瘤的研究,旨在减少手动审查报告所需的时间,并建立一种分类脑瘤的自动化系统。提出了一种自动化管道,包括五个模型:CNN、ResNet50、InceptionV3、EfficientNetB0和NASNetMobile。对 proposed 架构的性能在平衡数据集上进行评估,并发现 fine-tuning InceptionV3 模型的精度为 99.33%。此外,引入解释性人工智能方法,以可视化模型的潜在行为,以理解其黑盒行为。为了进一步优化训练过程,提出了一种对成本敏感的神经网络方法,以与不平衡数据集工作,比 our 实验中使用的传统模型 achieve 几乎 4% 更高的精度。对成本敏感的 InceptionV3(CS-InceptionV3)和 CNN(CS-CNN)在不平衡数据集上表现出令人充满希望的精度,分别为 92.31% 和 1.00%。 proposed 模型在提高脑瘤检测精度方面表现出巨大潜力,必须进一步开发,以应用于实际解决方案。我们提供了数据集,并将我们的实现公开可用性,即 this https URL。
https://arxiv.org/abs/2308.00608