Recently, several approaches successfully demonstrated that weight-sharing Neural Architecture Search (NAS) can effectively explore a search space of elastic low-rank adapters (LoRA), allowing the parameter-efficient fine-tuning (PEFT) and compression of large language models. In this paper, we introduce a novel approach called Shears, demonstrating how the integration of cost-effective sparsity and a proposed Neural Low-rank adapter Search (NLS) algorithm can further improve the efficiency of PEFT approaches. Results demonstrate the benefits of Shears compared to other methods, reaching high sparsity levels while improving or with little drop in accuracy, utilizing a single GPU for a pair of hours.
近年来,几种方法成功地证明了权重共享神经架构搜索(NAS)可以有效探索弹性低秩适应器(LoRA)的搜索空间,使得参数高效的微调(PEFT)和大语言模型的压缩。在本文中,我们引入了一种名为Shears的新方法,证明了将经济效益的稀疏性和所提出的神经低秩适应器搜索(NLS)算法相结合可以进一步提高PEFT方法的效率。结果表明,Shears相对于其他方法具有明显的优势,可以在高稀疏水平上提高精度,或者在准确性略有下降的情况下提高效率,同时使用一个GPU进行两个小时的任务。
https://arxiv.org/abs/2404.10934
We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.
我们向您介绍最新的 MobileNets generation,被称为 MobileNetV4(MNv4),它具有移动设备上普遍高效的架构设计。在其核心,我们引入了统一且灵活的结构 - 逆瓶颈(UIB)搜索块,将逆瓶颈(IB)、ConvNext、Feed Forward Network(FFN)和新颖的 Extra Depthwise(ExtraDW)结合在一起。与 UIB 一起,我们还介绍了 Mobile MQA,一种专为移动加速器设计的关注块,实现了显著的 39% 的速度提升。此外,还引入了优化的神经架构搜索(NAS)食谱,提高了 MNv4 的搜索效果。将 UIB、移动 MQA 和优化的 NAS 食谱相结合,产生了在移动 CPUs、DSPs、GPUs 和专用加速器(如苹果 Neural Engine 和谷歌 Pixel EdgeTPU)上普遍最优的 MNv4 模型 - 这是其他模型中没有发现的特征。最后,为了进一步提高准确性,我们引入了一种新的蒸馏技术。通过这种技术,我们的 MNv4-Hybrid-Large 模型在 ImageNet-1K 上的准确率达到了 87%,Pixel 8 EdgeTPU 运行时间仅为 3.8ms。
https://arxiv.org/abs/2404.10518
Hardware-aware Neural Architecture Search approaches (HW-NAS) automate the design of deep learning architectures, tailored specifically to a given target hardware platform. Yet, these techniques demand substantial computational resources, primarily due to the expensive process of assessing the performance of identified architectures. To alleviate this problem, a recent direction in the literature has employed representation similarity metric for efficiently evaluating architecture performance. Nonetheless, since it is inherently a single objective method, it requires multiple runs to identify the optimal architecture set satisfying the diverse hardware cost constraints, thereby increasing the search cost. Furthermore, simply converting the single objective into a multi-objective approach results in an under-explored architectural search space. In this study, we propose a Multi-Objective method to address the HW-NAS problem, called MO-HDNAS, to identify the trade-off set of architectures in a single run with low computational cost. This is achieved by optimizing three objectives: maximizing the representation similarity metric, minimizing hardware cost, and maximizing the hardware cost diversity. The third objective, i.e. hardware cost diversity, is used to facilitate a better exploration of the architecture search space. Experimental results demonstrate the effectiveness of our proposed method in efficiently addressing the HW-NAS problem across six edge devices for the image classification task.
硬件感知的神经架构搜索方法(HW-NAS)自动设计适用于特定目标硬件平台的深度学习架构。然而,这些技术需要大量的计算资源,主要原因是确定架构性能的过程代价昂贵。为解决这个问题,文献中最近的一个方向采用表示相似性度量来高效评估架构性能。然而,由于它本质上是一个单目标方法,因此需要多次运行来找到满足多样硬件成本约束的最优架构集合,从而增加搜索成本。此外,将单目标转换为多目标方法导致了一个未被探索的建筑搜索空间。在本研究中,我们提出了一种多目标方法来解决HW-NAS问题,称为MO-HDNAS,以在低计算成本的单次运行中识别架构的权衡集。这是通过优化三个目标来实现的:最大化表示相似性度量,最小化硬件成本,最大化硬件成本多样性。第三个目标,即硬件成本多样性,用于促进更好地探索架构搜索空间。实验结果表明,我们提出的方法在有效地解决图像分类任务的六个边缘设备上的HW-NAS问题方面非常有效。
https://arxiv.org/abs/2404.12403
The key to device-edge co-inference paradigm is to partition models into computation-friendly and computation-intensive parts across the device and the edge, respectively. However, for Graph Neural Networks (GNNs), we find that simply partitioning without altering their structures can hardly achieve the full potential of the co-inference paradigm due to various computational-communication overheads of GNN operations over heterogeneous devices. We present GCoDE, the first automatic framework for GNN that innovatively Co-designs the architecture search and the mapping of each operation on Device-Edge hierarchies. GCoDE abstracts the device communication process into an explicit operation and fuses the search of architecture and the operations mapping in a unified space for joint-optimization. Also, the performance-awareness approach, utilized in the constraint-based search process of GCoDE, enables effective evaluation of architecture efficiency in diverse heterogeneous systems. We implement the co-inference engine and runtime dispatcher in GCoDE to enhance the deployment efficiency. Experimental results show that GCoDE can achieve up to $44.9\times$ speedup and $98.2\%$ energy reduction compared to existing approaches across various applications and system configurations.
设备边缘协同推理范式的关键在于将模型在设备和边缘之间 partition成计算友好和计算密集的部分。然而,对于图神经网络(GNNs),我们发现,仅仅通过分割模型而不改变其结构,很难实现协同推理范式的全部潜力,因为GNN操作在异构设备上的计算通信开销 various。我们提出了GCoDE,第一个自动框架,创新地协同设计GNN的架构搜索和每个操作在设备-边缘层次结构上的映射。GCoDE将设备通信过程抽象为一个显式操作,并将搜索架构和操作映射统一到一个联合优化的空间。此外,GCoDE使用的性能感知方法使得在各种异构系统中的架构效率有效评估。我们在GCoDE中实现了协同推理引擎和运行时调度器,以提高部署效率。实验结果表明,GCoDE可以在各种应用和系统配置上实现最高速度up至44.9倍,能量减少至98.2%。
https://arxiv.org/abs/2404.05605
If our noise-canceling headphones can understand our audio environments, they can then inform us of important sound events, tune equalization based on the types of content we listen to, and dynamically adjust noise cancellation parameters based on audio scenes to further reduce distraction. However, running multiple audio understanding models on headphones with a limited energy budget and on-chip memory remains a challenging task. In this work, we identify a new class of neural network accelerators (e.g., NE16 on GAP9) that allows network weights to be quantized to different common (e.g., 8 bits) and uncommon bit-widths (e.g., 3 bits). We then applied a differentiable neural architecture search to search over the optimal bit-widths of a network on two different sound event detection tasks with potentially different requirements on quantization and prediction granularity (i.e., classification vs. embeddings for few-shot learning). We further evaluated our quantized models on actual hardware, showing that we reduce memory usage, inference latency, and energy consumption by an average of 62%, 46%, and 61% respectively compared to 8-bit models while maintaining floating point performance. Our work sheds light on the benefits of such accelerators on sound event detection tasks when combined with an appropriate search method.
如果我们的消噪音耳机可以理解我们的音频环境,它们就可以告诉我们重要的事件声音,根据我们听的内容调整均衡,并根据音频场景动态调整降噪参数,从而进一步减少干扰。然而,在有限能源预算和芯片内存储器的耳机上运行多个音频理解模型仍然具有挑战性。在这项工作中,我们识别出一种新的神经网络加速器(例如,GAP9上的NE16)允许网络权重以不同的常见(例如8位)和罕见位宽(例如3位)进行量化。然后,我们应用了不同的神经网络架构搜索来搜索在两个不同的音频事件检测任务上的网络的最佳位宽。我们还进一步评估了我们的量化模型在实际硬件上的效果,结果表明,与8位模型相比,我们平均降低了62%、46%和61%的内存使用量、推理延迟和能耗。我们的工作揭示了在结合适当的搜索方法时,为声音事件检测任务提供这种加速器的益处。
https://arxiv.org/abs/2404.04386
The boundless possibility of neural networks which can be used to solve a problem -- each with different performance -- leads to a situation where a Deep Learning expert is required to identify the best neural network. This goes against the hope of removing the need for experts. Neural Architecture Search (NAS) offers a solution to this by automatically identifying the best architecture. However, to date, NAS work has focused on a small set of datasets which we argue are not representative of real-world problems. We introduce eight new datasets created for a series of NAS Challenges: AddNIST, Language, MultNIST, CIFARTile, Gutenberg, Isabella, GeoClassing, and Chesseract. These datasets and challenges are developed to direct attention to issues in NAS development and to encourage authors to consider how their models will perform on datasets unknown to them at development time. We present experimentation using standard Deep Learning methods as well as the best results from challenge participants.
翻译:具有无限可能的神经网络用于解决问题--每个网络具有不同的性能--导致了一个情况,即需要一位深度学习专家来确定最佳的神经网络。这违背了消除专家的期望。神经架构搜索(NAS)通过自动确定最佳架构解决了这个问题。然而,到目前为止,NAS工作集中于我们认为是代表现实世界问题的少量数据集。我们引入了八个为NAS挑战系列创建的新数据集:AddNIST,语言,MultNIST,CIFARTile,Gutenberg,Isabella,GeoClassing,和Chesseract。这些数据集和挑战是为了引导对NAS发展的关注,并鼓励作者在开发时考虑他们模型在未知的数据集上的表现。我们展示了使用标准深度学习方法以及挑战参与者的最佳结果的实验。
https://arxiv.org/abs/2404.02189
Neural architecture search (NAS) is an effective method for discovering new convolutional neural network (CNN) architectures. However, existing approaches often require time-consuming training or intensive sampling and evaluations. Zero-shot NAS aims to create training-free proxies for architecture performance prediction. However, existing proxies have suboptimal performance, and are often outperformed by simple metrics such as model parameter counts or the number of floating-point operations. Besides, existing model-based proxies cannot be generalized to new search spaces with unseen new types of operators without golden accuracy truth. A universally optimal proxy remains elusive. We introduce TG-NAS, a novel model-based universal proxy that leverages a transformer-based operator embedding generator and a graph convolution network (GCN) to predict architecture performance. This approach guides neural architecture search across any given search space without the need of retraining. Distinct from other model-based predictor subroutines, TG-NAS itself acts as a zero-cost (ZC) proxy, guiding architecture search with advantages in terms of data independence, cost-effectiveness, and consistency across diverse search spaces. Our experiments showcase its advantages over existing proxies across various NAS benchmarks, suggesting its potential as a foundational element for efficient architecture search. TG-NAS achieves up to 300X improvements in search efficiency compared to previous SOTA ZC proxy methods. Notably, it discovers competitive models with 93.75% CIFAR-10 accuracy on the NAS-Bench-201 space and 74.5% ImageNet top-1 accuracy on the DARTS space.
神经架构搜索(NAS)是一种有效的发现新卷积神经网络(CNN)架构的方法。然而,现有的方法通常需要耗时的训练或密集的抽样和评估。零样本NAS旨在创建用于架构性能预测的训练免费代理。然而,现有的代理具有亚优性能,并且通常被简单的指标(如模型参数计数或浮点操作数)所超越。此外,现有的基于模型的代理不能推广到未见过的搜索空间,在没有黄金准确性真相的情况下,对新类型操作器没有指导作用。普遍最优代理仍然是遥不可及的。我们引入了TG-NAS,一种新颖的基于模型的通用代理,它利用了Transformer-based operator embedding generator和图卷积网络(GCN)来预测架构性能。这种方法在任意搜索空间上指导神经架构搜索,无需重新训练。与其他模型基预测器子程序相比,TG-NAS本身充当零成本(ZC)代理,在数据独立性、成本效益和多样性搜索空间中的准确性方面具有优势。我们的实验展示了TG-NAS在各种NAS基准上的优势,表明其可能是有效架构搜索的基础元素。TG-NAS在搜索效率上实现了比之前SOTA ZC代理方法高达300倍的提升。值得注意的是,它在新兴NAS基准空间上发现了具有93.75% CIFAR-10准确性的竞争模型,在DARTS空间上具有74.5%的ImageNet top-1准确率。
https://arxiv.org/abs/2404.00271
Compression of large and performant vision foundation models (VFMs) into arbitrary bit-wise operations (BitOPs) allows their deployment on various hardware. We propose to fine-tune a VFM to a mixed-precision quantized supernet. The supernet-based neural architecture search (NAS) can be adopted for this purpose, which trains a supernet, and then subnets within arbitrary hardware budgets can be extracted. However, existing methods face difficulties in optimizing the mixed-precision search space and incurring large memory costs during training. To tackle these challenges, first, we study the effective search space design for fine-tuning a VFM by comparing different operators (such as resolution, feature size, width, depth, and bit-widths) in terms of performance and BitOPs reduction. Second, we propose memory-efficient supernet training using a low-rank adapter (LoRA) and a progressive training strategy. The proposed method is evaluated for the recently proposed VFM, Segment Anything Model, fine-tuned on segmentation tasks. The searched model yields about a 95% reduction in BitOPs without incurring performance degradation.
将大型和高性能的视觉基础模型(VFMs)转换为任意位运算(BitOPs)允许它们在各种硬件上部署。我们提出将VFM细分为混合精度量化超网络。基于超网络的神经架构搜索(NAS)可以为此目的做出贡献,它训练一个超网络,然后可以提取任意硬件预算的子网络。然而,现有的方法在优化混合精度搜索空间和训练过程中产生大量内存开销方面面临困难。为解决这些挑战,我们首先研究了通过比较不同操作(如分辨率、特征大小、宽度、深度和位宽)来优化VFM的有效搜索空间设计,以及减少BitOPs的性能和减少。然后,我们提出了使用低秩适配器(LoRA)和渐进式训练策略进行内存高效超网络训练的方法。对所提出的方法在最近提出的VFM模型——分割任何模型上进行评估。通过搜索到的模型,没有出现性能退化,BitOPs减少了约95%。
https://arxiv.org/abs/2403.20080
Training-free network architecture search (NAS) aims to discover high-performing networks with zero-cost proxies, capturing network characteristics related to the final performance. However, network rankings estimated by previous training-free NAS methods have shown weak correlations with the performance. To address this issue, we propose AZ-NAS, a novel approach that leverages the ensemble of various zero-cost proxies to enhance the correlation between a predicted ranking of networks and the ground truth substantially in terms of the performance. To achieve this, we introduce four novel zero-cost proxies that are complementary to each other, analyzing distinct traits of architectures in the views of expressivity, progressivity, trainability, and complexity. The proxy scores can be obtained simultaneously within a single forward and backward pass, making an overall NAS process highly efficient. In order to integrate the rankings predicted by our proxies effectively, we introduce a non-linear ranking aggregation method that highlights the networks highly-ranked consistently across all the proxies. Experimental results conclusively demonstrate the efficacy and efficiency of AZ-NAS, outperforming state-of-the-art methods on standard benchmarks, all while maintaining a reasonable runtime cost.
无需训练的网络架构搜索(NAS)旨在通过零成本代理发现具有高性能的网络,并捕获与最终性能相关的网络特征。然而,之前基于NAS的方法估计的网络排名与性能之间存在较弱的关联。为了解决这个问题,我们提出了AZ-NAS,一种利用各种零成本代理的集合并提高预测网络排名与地面真值之间关联的新方法。为了解决这个问题,我们引入了四个新的零成本代理,这些代理在表现力、进步性、可训练性和复杂性方面具有互补性,分析架构的不同特征。代理分数可以在一次前向和反向传播中同时获得,使整体NAS过程非常高效。为了有效地整合我们的代理的排名,我们引入了一种非线性排名聚合方法,该方法突出了网络在所有代理中高度排名的一致性。实验结果充分证明了AZ-NAS的有效性和高效性,在标准基准测试中超过了最先进的方法,同时保持了合理的运行成本。
https://arxiv.org/abs/2403.19232
Deep implicit functions have been found to be an effective tool for efficiently encoding all manner of natural signals. Their attractiveness stems from their ability to compactly represent signals with little to no off-line training data. Instead, they leverage the implicit bias of deep networks to decouple hidden redundancies within the signal. In this paper, we explore the hypothesis that additional compression can be achieved by leveraging the redundancies that exist between layers. We propose to use a novel run-time decoder-only hypernetwork - that uses no offline training data - to better model this cross-layer parameter redundancy. Previous applications of hyper-networks with deep implicit functions have applied feed-forward encoder/decoder frameworks that rely on large offline datasets that do not generalize beyond the signals they were trained on. We instead present a strategy for the initialization of run-time deep implicit functions for single-instance signals through a Decoder-Only randomly projected Hypernetwork (D'OH). By directly changing the dimension of a latent code to approximate a target implicit neural architecture, we provide a natural way to vary the memory footprint of neural representations without the costly need for neural architecture search on a space of alternative low-rate structures.
深度隐含函数已被证明是一种有效的工具,用于高效地编码所有类型的自然信号。它们的吸引力源于它们能够以少量的离线训练数据来压缩信号。相反,它们利用深度网络的隐含偏见来解耦信号中的隐藏冗余。在本文中,我们探讨了通过利用层之间的冗余性来获得额外压缩的可能性。我们提出了一个新型的仅运行时解码器-仅网络 - 用于更好地建模跨层参数冗余。以前使用具有深度隐含函数的过网络的应用程序依赖于大型离线训练数据集,这些数据集不适用于它们所训练的信号。我们相反提出了一个通过Decoder-Only随机投影Hypernetwork(D'OH)初始化运行时深度隐含函数的策略。通过直接将潜在代码的维度变换为近似目标隐含神经架构,我们提供了在不需要进行神经架构搜索的情况下自然地变化神经表示的内存足迹的方法。
https://arxiv.org/abs/2403.19163
Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset.
在自然语言处理中,在大文本语料库上对语言模型的预训练是一种常见的做法。接下来,为了在各种任务上获得最佳结果,对这些模型进行微调。在本文中,我们质疑在网络顶部仅添加一个输出层作为分类头的常见做法。我们进行了一系列自动机器学习搜索,以寻找在仅有较小计算成本的情况下超越当前单层输出的架构。我们在GLUE数据集上验证了我们的分类架构。
https://arxiv.org/abs/2403.18547
Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network architecture search framework with the adaptive multimodel fusion (AMF-ENAS). Specifically, we design an encoding space that simultaneously considers fusion positions and ratios of the multimodal data, allowing for the automatic construction of multimodal networks with different architectures through decoding. Additionally, we consider three input streams corresponding to intra-modal surface electromyography (sEMG), intra-modal accelerometer (ACC), and inter-modal sEMG-ACC. To automatically adapt to various datasets, the ENAS framework is designed to automatically search a MHGR network with appropriate fusion positions and ratios. To the best of our knowledge, this is the first time that ENAS has been utilized in MHGR to tackle issues related to the fusion position and ratio of multimodal data. Experimental results demonstrate that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.
基于多模态数据的双手势识别(HGR)引起了相当大的关注,因为其在应用领域具有很大的潜力。 various手工设计的多模态深度网络在多模态HGR(MHGR)表现良好,但现有的算法需要很多专家经验和费时费力的手动尝试。为了解决这些问题,我们提出了一个基于自适应多模态融合(AMF-ENAS)的进化网络架构搜索框架。具体来说,我们设计了一个编码空间,同时考虑多模态数据的融合位置和比值,允许通过解码自动构建具有不同架构的 multimodal 网络。此外,我们考虑三个输入流,分别是对模态表面电生理(sEMG)、对模态加速度计(ACC)和跨模态 sEMG-ACC。为了自动适应各种数据集,ENAS 框架被设计为自动搜索具有适当融合位置和比值的 MHGR 网络。据我们所知,这是 ENAS 首次用于解决多模态数据融合位置和比值的问题。实验结果表明,AMF-ENAS 在 Ninapro DB2、DB3 和 DB7 数据集上取得了最先进的性能。
https://arxiv.org/abs/2403.18208
Multiple object tracking is a critical task in autonomous driving. Existing works primarily focus on the heuristic design of neural networks to obtain high accuracy. As tracking accuracy improves, however, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy. Another challenge for object tracking is the unreliability of a single sensor, therefore, we propose a multi-modal framework to improve the robustness. Experiments demonstrate that our algorithm can run on edge devices within lower latency constraints, thus greatly reducing the computational requirements for multi-modal object tracking while keeping lower latency.
multiple object tracking是自动驾驶中的一个关键任务。现有的工作主要集中在神经网络的启发式设计以获得高精度的精确度。然而,随着跟踪准确度的提高,神经网络变得越来越复杂,这对其在现实驾驶场景中的实际应用造成了延迟。在本文中,我们探讨了使用神经架构搜索(NAS)方法来寻找跟踪的高效架构,旨在保持较低的实时延迟,同时保持相对较高的精度。另一个挑战是物体跟踪的不确定性,因此我们提出了一个多模态框架来提高其鲁棒性。实验证明,我们的算法可以在较低的延迟约束下运行边缘设备,从而大大减少多模态物体跟踪的计算需求,同时保持较低的延迟。
https://arxiv.org/abs/2403.15712
Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets.
语音情感识别(SER)对让计算机理解人类交流中传达的情感至关重要。随着深度学习(DL)的最近进步,SER模型的性能显著提高。然而,设计最优的DL架构需要专业知识和实验评估。幸运的是,神经架构搜索(NAS)提供了一种自动确定最佳DL模型的潜在解决方案。不同可导架构搜索(DARTS)是一种特别有效的发现最优模型的方法。 本研究提出了emoDARTS,一种经过DARTS优化的联合CNN和序列神经网络(SeqNN: LSTM,RNN)架构,提高了SER性能。文献支持选择CNN和LSTM耦合以提高性能。尽管DARTS以前已经被用于选择独立的CNN和LSTM操作,但我们的技术通过使用DARTS选择最佳层序来结合CNN和SeqNN操作。与以前的工作不同,我们没有对CNN的层序施加限制。相反,让DARTS在DARTS单元内选择最佳层序。我们证明了emoDARTS超越了通过DARTS在CNN-LSTM上实现的最佳报告SER结果,通过在IEMOCAP、MSP-IMPROV和MSP-Podcast数据集上评估我们的方法。
https://arxiv.org/abs/2403.14083
Neural Architecture Search is a costly practice. The fact that a search space can span a vast number of design choices with each architecture evaluation taking nontrivial overhead makes it hard for an algorithm to sufficiently explore candidate networks. In this paper, we propose AutoBuild, a scheme which learns to align the latent embeddings of operations and architecture modules with the ground-truth performance of the architectures they appear in. By doing so, AutoBuild is capable of assigning interpretable importance scores to architecture modules, such as individual operation features and larger macro operation sequences such that high-performance neural networks can be constructed without any need for search. Through experiments performed on state-of-the-art image classification, segmentation, and Stable Diffusion models, we show that by mining a relatively small set of evaluated architectures, AutoBuild can learn to build high-quality architectures directly or help to reduce search space to focus on relevant areas, finding better architectures that outperform both the original labeled ones and ones found by search baselines. Code available at this https URL
Neural Architecture Search是一种昂贵的行为。事实证明,每次评估架构时,搜索空间都可以扩展到包含许多设计选择,每个评估都需要付出非 trivial 的开销,这使得算法很难充分探索候选网络。在本文中,我们提出了AutoBuild,一种学习如何将操作和架构模块的潜在表示与它们出现的架构的地面真实性能对齐的方案。通过这样做,AutoBuild能够为架构模块分配可解释的重要性分数,例如单个操作特征和较大的宏观操作序列,从而无需进行搜索即可构建高性能的神经网络。通过在先进的图像分类、分割和Stable Diffusion模型上进行实验,我们证明了,通过挖掘相对较小的评估架构,AutoBuild可以学会直接构建高质量的建筑,或者帮助缩小搜索空间,专注于相关领域,找到优于原始标签的和搜索基线的更好架构。代码在此处 https:// 这个链接 。
https://arxiv.org/abs/2403.13293
Recent developments in neural architecture search (NAS) emphasize the significance of considering robust architectures against malicious data. However, there is a notable absence of benchmark evaluations and theoretical guarantees for searching these robust architectures, especially when adversarial training is considered. In this work, we aim to address these two challenges, making twofold contributions. First, we release a comprehensive data set that encompasses both clean accuracy and robust accuracy for a vast array of adversarially trained networks from the NAS-Bench-201 search space on image datasets. Then, leveraging the neural tangent kernel (NTK) tool from deep learning theory, we establish a generalization theory for searching architecture in terms of clean accuracy and robust accuracy under multi-objective adversarial training. We firmly believe that our benchmark and theoretical insights will significantly benefit the NAS community through reliable reproducibility, efficient assessment, and theoretical foundation, particularly in the pursuit of robust architectures.
近年来,神经架构搜索(NAS)的发展强调了考虑恶意数据对健壮架构的重要性。然而,在考虑对抗性训练时,尤其是在 NAS-Bench-201 搜索空间上,目前缺乏针对这些健壮架构的基准评估和理论保证。在这项工作中,我们旨在解决这两个挑战,做出双重的贡献。我们发布了涵盖 NAS-Bench-201 搜索空间中大量对抗性训练网络的完整数据集,这些网络在图像数据集上的准确性和健壮性都得到了评估。然后,利用深度学习理论中的神经凸度核(NTK)工具,我们在多目标对抗性训练下建立了关于清洁准确性和健壮准确性的通用搜索理论。我们坚信,通过可靠的重复性和高效的评估,以及理论基础,我们的基准和理论见解将显著有益于 NAS 社区,特别是在追求健壮架构方面。
https://arxiv.org/abs/2403.13134
Autonomous driving systems are a rapidly evolving technology that enables driverless car production. Trajectory prediction is a critical component of autonomous driving systems, enabling cars to anticipate the movements of surrounding objects for safe navigation. Trajectory prediction using Lidar point-cloud data performs better than 2D images due to providing 3D information. However, processing point-cloud data is more complicated and time-consuming than 2D images. Hence, state-of-the-art 3D trajectory predictions using point-cloud data suffer from slow and erroneous predictions. This paper introduces TrajectoryNAS, a pioneering method that focuses on utilizing point cloud data for trajectory prediction. By leveraging Neural Architecture Search (NAS), TrajectoryNAS automates the design of trajectory prediction models, encompassing object detection, tracking, and forecasting in a cohesive manner. This approach not only addresses the complex interdependencies among these tasks but also emphasizes the importance of accuracy and efficiency in trajectory modeling. Through empirical studies, TrajectoryNAS demonstrates its effectiveness in enhancing the performance of autonomous driving systems, marking a significant advancement in the field.Experimental results reveal that TrajcetoryNAS yield a minimum of 4.8 higger accuracy and 1.1* lower latency over competing methods on the NuScenes dataset.
自动驾驶系统是一种快速发展的技术,可以实现无人驾驶汽车的生产。轨迹预测是自动驾驶系统的一个关键组件,可以让汽车预测周围物体的运动,实现安全导航。使用激光点云数据进行轨迹预测的轨迹预测比二维图像更好,因为提供了3D信息。然而,处理点云数据的过程更加复杂和耗时。因此,使用点云数据进行轨迹预测的先进方法存在慢且错误的预测。本文介绍了TrajectoryNAS,一种领先的方法,专注于利用点云数据进行轨迹预测。通过利用神经架构搜索(NAS),TrajectoryNAS自动设计轨迹预测模型,将物体检测、跟踪和预测整合在一起。这种方法不仅解决了这些任务之间的复杂依赖关系,还强调了轨迹建模中准确性和效率的重要性。通过实验研究,TrajectoryNAS在NuScenes数据集上的性能至少比竞争方法提高了4.8个精度,延迟降低了1.1倍。
https://arxiv.org/abs/2403.11695
Supernet is a core component in many recent Neural Architecture Search (NAS) methods. It not only helps embody the search space but also provides a (relative) estimation of the final performance of candidate architectures. Thus, it is critical that the top architectures ranked by a supernet should be consistent with those ranked by true performance, which is known as the order-preserving ability. In this work, we analyze the order-preserving ability on the whole search space (global) and a sub-space of top architectures (local), and empirically show that the local order-preserving for current two-stage NAS methods still need to be improved. To rectify this, we propose a novel concept of Supernet Shifting, a refined search strategy combining architecture searching with supernet fine-tuning. Specifically, apart from evaluating, the training loss is also accumulated in searching and the supernet is updated every iteration. Since superior architectures are sampled more frequently in evolutionary searching, the supernet is encouraged to focus on top architectures, thus improving local order-preserving. Besides, a pre-trained supernet is often un-reusable for one-shot methods. We show that Supernet Shifting can fulfill transferring supernet to a new dataset. Specifically, the last classifier layer will be unset and trained through evolutionary searching. Comprehensive experiments show that our method has better order-preserving ability and can find a dominating architecture. Moreover, the pre-trained supernet can be easily transferred into a new dataset with no loss of performance.
Supernet是一种在许多最近的新神经架构搜索(NAS)方法中作为核心组件的组件。它不仅有助于表示搜索空间,还提供了对最终候选架构的(相对)估计。因此,在超网络排名最高的设计师中,排名应该与真正的性能相一致,这被称为顺序保持能力。在这项工作中,我们分析了整个搜索空间(全局)和顶级架构子空间(局部)的顺序保持能力,并经验性地表明,当前的两阶段NAS方法中的局部顺序保持能力仍需要改进。为了纠正这个问题,我们提出了一个新的概念——超网络平移,一种将架构搜索与超网络微调相结合的优化搜索策略。具体来说,除了评估外,训练损失也在搜索过程中累积,并在每个迭代中更新超网络。由于在进化搜索中优质架构的采样更频繁,因此超网络被鼓励专注于顶级架构,从而提高局部顺序保持能力。此外,预训练的超网络通常无法用于一次性的方法。我们证明了超网络平移可以将预训练的超网络转移到新的数据集。具体来说,最后一个分类层将设置为空并通过对进化搜索进行训练来更新。全面的实验表明,我们的方法具有更好的顺序保持能力,可以找到主导架构。此外,预训练的超网络可以很容易地将转移到新的数据集,而不会损失性能。
https://arxiv.org/abs/2403.11380
Image segmentation is one of the most fundamental problems in computer vision and has drawn a lot of attentions due to its vast applications in image understanding and autonomous driving. However, designing effective and efficient segmentation neural architectures is a labor-intensive process that may require lots of trials by human experts. In this paper, we address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently, by leveraging architecture search. Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution. By contrast, we develop a multi-target multi-branch supernet method, which not only fully utilizes the advantages of high-resolution features, but also finds the proper location for placing multi-head self-attention module. Our search algorithm is optimized towards multiple objective s (e.g., latency and mIoU) and capable of finding architectures on Pareto frontier with arbitrary number of branches in a single search. We further present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers between branches from different resolutions and fuse to high resolution for both efficiency and effectiveness. Extensive experiments demonstrate that HyCTAS outperforms previous methods on semantic segmentation task. Code and models are available at \url{this https URL}.
图像分割是计算机视觉中最基本的问题之一,由于其在图像理解和自动驾驶中的广泛应用,因此受到了很多关注。然而,设计有效且高效的分割神经架构是一个劳动密集的过程,可能需要许多专家的人工作尝试。在本文中,我们通过利用架构搜索解决了将多头自注意力集成到高分辨率表示CNNs中的问题,通过构建多目标多分支超级网络。通过手动替换卷积层为多头自注意力,由于需要高昂的内存开销来维持高分辨率,因此这是不可能的。相反,我们开发了一种多目标多分支超级网络方法,不仅充分利用了高分辨率特征的优势,而且发现了放置多头自注意力的适当位置。我们的搜索算法针对多个目标(如延迟和mIoU)进行优化,可以在一个搜索中找到架构在帕累托前沿的任意数量分支上的最优架构。我们还通过HyCTAS方法展示了一系列模型,该方法在寻找不同分辨率分支的最佳轻量级卷积层和内存高效的自注意力层之间进行了搜索,将高分辨率与效率进行了平衡。大量实验证明,HyCTAS在语义分割任务上优于以前的算法。代码和模型可在此处访问:\url{这个链接}。
https://arxiv.org/abs/2403.10413
Benchmarking plays a pivotal role in assessing and enhancing the performance of compact deep learning models designed for execution on resource-constrained devices, such as microcontrollers. Our study introduces a novel, entirely artificially generated benchmarking dataset tailored for speech recognition, representing a core challenge in the field of tiny deep learning. SpokeN-100 consists of spoken numbers from 0 to 99 spoken by 32 different speakers in four different languages, namely English, Mandarin, German and French, resulting in 12,800 audio samples. We determine auditory features and use UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) as a dimensionality reduction method to show the diversity and richness of the dataset. To highlight the use case of the dataset, we introduce two benchmark tasks: given an audio sample, classify (i) the used language and/or (ii) the spoken number. We optimized state-of-the-art deep neural networks and performed an evolutionary neural architecture search to find tiny architectures optimized for the 32-bit ARM Cortex-M4 nRF52840 microcontroller. Our results represent the first benchmark data achieved for SpokeN-100.
基准测试在评估和增强资源受限设备上设计的紧凑型深度学习模型的性能中发挥着重要作用,例如微控制器。我们的研究介绍了一个新的、完全人工生成的基准测试数据集,专门针对语音识别,代表了该领域中最小的深度学习挑战。SpokeN-100 包括来自 0 到 99 的语音数字,由 32 名不同的说话者用英语、普通话、德语和法语讲述了,共产生 12,800 个音频样本。我们确定音频特征,并使用 UMAP(统一曼哈顿近似和投影降维)作为降维方法,以展示数据集的多样性和丰富性。为了突出该数据集的使用案例,我们引入了两个基准任务:给定一个音频样本,分类(i)使用的语言,(ii)说话的数字。我们优化了最先进的深度神经网络,并进行了进化神经架构搜索,以找到针对 32 位 ARM Cortex-M4 nRF52840 微控制器的最佳架构。我们的结果代表了 SpokeN-100 第一个基准数据。
https://arxiv.org/abs/2403.09753