NAS

D'OH: Decoder-Only random Hypernetworks for Implicit Neural Representations

2024-03-28 06:18:12

Cameron Gordon, Lachlan Ewen MacDonald, Hemanth Saratchandran, Simon Lucey

arXiv_CV

arXiv_CV NAS Pose
Abstract

Deep implicit functions have been found to be an effective tool for efficiently encoding all manner of natural signals. Their attractiveness stems from their ability to compactly represent signals with little to no off-line training data. Instead, they leverage the implicit bias of deep networks to decouple hidden redundancies within the signal. In this paper, we explore the hypothesis that additional compression can be achieved by leveraging the redundancies that exist between layers. We propose to use a novel run-time decoder-only hypernetwork - that uses no offline training data - to better model this cross-layer parameter redundancy. Previous applications of hyper-networks with deep implicit functions have applied feed-forward encoder/decoder frameworks that rely on large offline datasets that do not generalize beyond the signals they were trained on. We instead present a strategy for the initialization of run-time deep implicit functions for single-instance signals through a Decoder-Only randomly projected Hypernetwork (D'OH). By directly changing the dimension of a latent code to approximate a target implicit neural architecture, we provide a natural way to vary the memory footprint of neural representations without the costly need for neural architecture search on a space of alternative low-rate structures.

Abstract (translated)

深度隐含函数已被证明是一种有效的工具，用于高效地编码所有类型的自然信号。它们的吸引力源于它们能够以少量的离线训练数据来压缩信号。相反，它们利用深度网络的隐含偏见来解耦信号中的隐藏冗余。在本文中，我们探讨了通过利用层之间的冗余性来获得额外压缩的可能性。我们提出了一个新型的仅运行时解码器-仅网络 - 用于更好地建模跨层参数冗余。以前使用具有深度隐含函数的过网络的应用程序依赖于大型离线训练数据集，这些数据集不适用于它们所训练的信号。我们相反提出了一个通过Decoder-Only随机投影Hypernetwork（D'OH）初始化运行时深度隐含函数的策略。通过直接将潜在代码的维度变换为近似目标隐含神经架构，我们提供了在不需要进行神经架构搜索的情况下自然地变化神经表示的内存足迹的方法。

URL

https://arxiv.org/abs/2403.19163

PDF

https://arxiv.org/pdf/2403.19163.pdf
Read All
Neural Architecture Search for Sentence Classification with BERT

2024-03-27 13:25:43

Philip Kenneweg, Sarah Schröder, Barbara Hammer

arXiv_AI

arXiv_AI Classification Language_Model NAS Bert
Abstract

Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset.

Abstract (translated)

在自然语言处理中,在大文本语料库上对语言模型的预训练是一种常见的做法。接下来,为了在各种任务上获得最佳结果,对这些模型进行微调。在本文中,我们质疑在网络顶部仅添加一个输出层作为分类头的常见做法。我们进行了一系列自动机器学习搜索,以寻找在仅有较小计算成本的情况下超越当前单层输出的架构。我们在GLUE数据集上验证了我们的分类架构。

URL

https://arxiv.org/abs/2403.18547

PDF

https://arxiv.org/pdf/2403.18547.pdf
Read All
An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition

2024-03-27 02:39:23

Yizhang Xia, Shihao Song, Zhanglu Hou, Junwen Xu, Juan Zou, Yuan Liu, Shengxiang Yang

arXiv_AI

arXiv_AI Recognition Face Gesture Attention Knowledge NAS Pose
Abstract

Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network architecture search framework with the adaptive multimodel fusion (AMF-ENAS). Specifically, we design an encoding space that simultaneously considers fusion positions and ratios of the multimodal data, allowing for the automatic construction of multimodal networks with different architectures through decoding. Additionally, we consider three input streams corresponding to intra-modal surface electromyography (sEMG), intra-modal accelerometer (ACC), and inter-modal sEMG-ACC. To automatically adapt to various datasets, the ENAS framework is designed to automatically search a MHGR network with appropriate fusion positions and ratios. To the best of our knowledge, this is the first time that ENAS has been utilized in MHGR to tackle issues related to the fusion position and ratio of multimodal data. Experimental results demonstrate that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.

Abstract (translated)

基于多模态数据的双手势识别（HGR）引起了相当大的关注，因为其在应用领域具有很大的潜力。 various手工设计的多模态深度网络在多模态HGR（MHGR）表现良好，但现有的算法需要很多专家经验和费时费力的手动尝试。为了解决这些问题，我们提出了一个基于自适应多模态融合（AMF-ENAS）的进化网络架构搜索框架。具体来说，我们设计了一个编码空间，同时考虑多模态数据的融合位置和比值，允许通过解码自动构建具有不同架构的 multimodal 网络。此外，我们考虑三个输入流，分别是对模态表面电生理（sEMG）、对模态加速度计（ACC）和跨模态 sEMG-ACC。为了自动适应各种数据集，ENAS 框架被设计为自动搜索具有适当融合位置和比值的 MHGR 网络。据我们所知，这是 ENAS 首次用于解决多模态数据融合位置和比值的问题。实验结果表明，AMF-ENAS 在 Ninapro DB2、DB3 和 DB7 数据集上取得了最先进的性能。

URL

https://arxiv.org/abs/2403.18208

PDF

https://arxiv.org/pdf/2403.18208.pdf
Read All
PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search

2024-03-23 04:18:49

Chensheng Peng, Zhaoyu Zeng, Jinling Gao, Jundong Zhou, Masayoshi Tomizuka, Xinbing Wang, Chenghu Zhou, Nanyang Ye

arXiv_RO

arXiv_RO Object_Tracking Tracking NAS Pose Autonomous
Abstract

Multiple object tracking is a critical task in autonomous driving. Existing works primarily focus on the heuristic design of neural networks to obtain high accuracy. As tracking accuracy improves, however, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy. Another challenge for object tracking is the unreliability of a single sensor, therefore, we propose a multi-modal framework to improve the robustness. Experiments demonstrate that our algorithm can run on edge devices within lower latency constraints, thus greatly reducing the computational requirements for multi-modal object tracking while keeping lower latency.

Abstract (translated)

multiple object tracking是自动驾驶中的一个关键任务。现有的工作主要集中在神经网络的启发式设计以获得高精度的精确度。然而，随着跟踪准确度的提高，神经网络变得越来越复杂，这对其在现实驾驶场景中的实际应用造成了延迟。在本文中，我们探讨了使用神经架构搜索（NAS）方法来寻找跟踪的高效架构，旨在保持较低的实时延迟，同时保持相对较高的精度。另一个挑战是物体跟踪的不确定性，因此我们提出了一个多模态框架来提高其鲁棒性。实验证明，我们的算法可以在较低的延迟约束下运行边缘设备，从而大大减少多模态物体跟踪的计算需求，同时保持较低的延迟。

URL

https://arxiv.org/abs/2403.15712

PDF

https://arxiv.org/pdf/2403.15712.pdf
Read All
emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition

2024-03-21 02:26:30

Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Berrak Sisman, Bjorn W. Schuller, Carlos Busso

arXiv_SD

arXiv_SD RNN Recognition Deep_Learning Knowledge NAS Pose Emotion Speech
Abstract

Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets.

Abstract (translated)

语音情感识别（SER）对让计算机理解人类交流中传达的情感至关重要。随着深度学习（DL）的最近进步，SER模型的性能显著提高。然而，设计最优的DL架构需要专业知识和实验评估。幸运的是，神经架构搜索（NAS）提供了一种自动确定最佳DL模型的潜在解决方案。不同可导架构搜索（DARTS）是一种特别有效的发现最优模型的方法。本研究提出了emoDARTS，一种经过DARTS优化的联合CNN和序列神经网络（SeqNN: LSTM，RNN）架构，提高了SER性能。文献支持选择CNN和LSTM耦合以提高性能。尽管DARTS以前已经被用于选择独立的CNN和LSTM操作，但我们的技术通过使用DARTS选择最佳层序来结合CNN和SeqNN操作。与以前的工作不同，我们没有对CNN的层序施加限制。相反，让DARTS在DARTS单元内选择最佳层序。我们证明了emoDARTS超越了通过DARTS在CNN-LSTM上实现的最佳报告SER结果，通过在IEMOCAP、MSP-IMPROV和MSP-Podcast数据集上评估我们的方法。

URL

https://arxiv.org/abs/2403.14083

PDF

https://arxiv.org/pdf/2403.14083.pdf
Read All
Building Optimal Neural Architectures using Interpretable Knowledge

2024-03-20 04:18:38

Keith G. Mills, Fred X. Han, Mohammad Salameh, Shengyao Lu, Chunhua Zhou, Jiao He, Fengyu Sun, Di Niu

arXiv_AI

arXiv_AI Segmentation Classification Image_Classification Embedding Knowledge NAS Pose Diffusion
Abstract

Neural Architecture Search is a costly practice. The fact that a search space can span a vast number of design choices with each architecture evaluation taking nontrivial overhead makes it hard for an algorithm to sufficiently explore candidate networks. In this paper, we propose AutoBuild, a scheme which learns to align the latent embeddings of operations and architecture modules with the ground-truth performance of the architectures they appear in. By doing so, AutoBuild is capable of assigning interpretable importance scores to architecture modules, such as individual operation features and larger macro operation sequences such that high-performance neural networks can be constructed without any need for search. Through experiments performed on state-of-the-art image classification, segmentation, and Stable Diffusion models, we show that by mining a relatively small set of evaluated architectures, AutoBuild can learn to build high-quality architectures directly or help to reduce search space to focus on relevant areas, finding better architectures that outperform both the original labeled ones and ones found by search baselines. Code available at this https URL

Abstract (translated)

Neural Architecture Search是一种昂贵的行为。事实证明，每次评估架构时，搜索空间都可以扩展到包含许多设计选择，每个评估都需要付出非 trivial 的开销，这使得算法很难充分探索候选网络。在本文中，我们提出了AutoBuild，一种学习如何将操作和架构模块的潜在表示与它们出现的架构的地面真实性能对齐的方案。通过这样做，AutoBuild能够为架构模块分配可解释的重要性分数，例如单个操作特征和较大的宏观操作序列，从而无需进行搜索即可构建高性能的神经网络。通过在先进的图像分类、分割和Stable Diffusion模型上进行实验，我们证明了，通过挖掘相对较小的评估架构，AutoBuild可以学会直接构建高质量的建筑，或者帮助缩小搜索空间，专注于相关领域，找到优于原始标签的和搜索基线的更好架构。代码在此处 https:// 这个链接。

URL

https://arxiv.org/abs/2403.13293

PDF

https://arxiv.org/pdf/2403.13293.pdf
Read All
Robust NAS under adversarial training: benchmark, theory, and beyond

2024-03-19 20:10:23

Yongtao Wu, Fanghui Liu, Carl-Johann Simon-Gabriel, Grigorios G Chrysos, Volkan Cevher

arXiv_AI

arXiv_AI Deep_Learning Adversarial NAS
Abstract

Recent developments in neural architecture search (NAS) emphasize the significance of considering robust architectures against malicious data. However, there is a notable absence of benchmark evaluations and theoretical guarantees for searching these robust architectures, especially when adversarial training is considered. In this work, we aim to address these two challenges, making twofold contributions. First, we release a comprehensive data set that encompasses both clean accuracy and robust accuracy for a vast array of adversarially trained networks from the NAS-Bench-201 search space on image datasets. Then, leveraging the neural tangent kernel (NTK) tool from deep learning theory, we establish a generalization theory for searching architecture in terms of clean accuracy and robust accuracy under multi-objective adversarial training. We firmly believe that our benchmark and theoretical insights will significantly benefit the NAS community through reliable reproducibility, efficient assessment, and theoretical foundation, particularly in the pursuit of robust architectures.

Abstract (translated)

近年来，神经架构搜索（NAS）的发展强调了考虑恶意数据对健壮架构的重要性。然而，在考虑对抗性训练时，尤其是在 NAS-Bench-201 搜索空间上，目前缺乏针对这些健壮架构的基准评估和理论保证。在这项工作中，我们旨在解决这两个挑战，做出双重的贡献。我们发布了涵盖 NAS-Bench-201 搜索空间中大量对抗性训练网络的完整数据集，这些网络在图像数据集上的准确性和健壮性都得到了评估。然后，利用深度学习理论中的神经凸度核（NTK）工具，我们在多目标对抗性训练下建立了关于清洁准确性和健壮准确性的通用搜索理论。我们坚信，通过可靠的重复性和高效的评估，以及理论基础，我们的基准和理论见解将显著有益于 NAS 社区，特别是在追求健壮架构方面。

URL

https://arxiv.org/abs/2403.13134

PDF

https://arxiv.org/pdf/2403.13134.pdf
Read All
TrajectoryNAS: A Neural Architecture Search for Trajectory Prediction

2024-03-18 11:48:41

Ali Asghar Sharifi, Ali Zoljodi, Masoud Daneshtalab

arXiv_CV

arXiv_CV Detection Object_Detection Tracking Prediction NAS Autonomous 3D Point_Cloud
Abstract

Autonomous driving systems are a rapidly evolving technology that enables driverless car production. Trajectory prediction is a critical component of autonomous driving systems, enabling cars to anticipate the movements of surrounding objects for safe navigation. Trajectory prediction using Lidar point-cloud data performs better than 2D images due to providing 3D information. However, processing point-cloud data is more complicated and time-consuming than 2D images. Hence, state-of-the-art 3D trajectory predictions using point-cloud data suffer from slow and erroneous predictions. This paper introduces TrajectoryNAS, a pioneering method that focuses on utilizing point cloud data for trajectory prediction. By leveraging Neural Architecture Search (NAS), TrajectoryNAS automates the design of trajectory prediction models, encompassing object detection, tracking, and forecasting in a cohesive manner. This approach not only addresses the complex interdependencies among these tasks but also emphasizes the importance of accuracy and efficiency in trajectory modeling. Through empirical studies, TrajectoryNAS demonstrates its effectiveness in enhancing the performance of autonomous driving systems, marking a significant advancement in the field.Experimental results reveal that TrajcetoryNAS yield a minimum of 4.8 higger accuracy and 1.1* lower latency over competing methods on the NuScenes dataset.

Abstract (translated)

自动驾驶系统是一种快速发展的技术，可以实现无人驾驶汽车的生产。轨迹预测是自动驾驶系统的一个关键组件，可以让汽车预测周围物体的运动，实现安全导航。使用激光点云数据进行轨迹预测的轨迹预测比二维图像更好，因为提供了3D信息。然而，处理点云数据的过程更加复杂和耗时。因此，使用点云数据进行轨迹预测的先进方法存在慢且错误的预测。本文介绍了TrajectoryNAS，一种领先的方法，专注于利用点云数据进行轨迹预测。通过利用神经架构搜索（NAS），TrajectoryNAS自动设计轨迹预测模型，将物体检测、跟踪和预测整合在一起。这种方法不仅解决了这些任务之间的复杂依赖关系，还强调了轨迹建模中准确性和效率的重要性。通过实验研究，TrajectoryNAS在NuScenes数据集上的性能至少比竞争方法提高了4.8个精度，延迟降低了1.1倍。

URL

https://arxiv.org/abs/2403.11695

PDF

https://arxiv.org/pdf/2403.11695.pdf
Read All
Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

2024-03-18 00:13:41

Beichen Zhang, Xiaoxing Wang, Xiaohan Qin, Junchi Yan

arXiv_CV

arXiv_CV NAS Pose
Abstract

Supernet is a core component in many recent Neural Architecture Search (NAS) methods. It not only helps embody the search space but also provides a (relative) estimation of the final performance of candidate architectures. Thus, it is critical that the top architectures ranked by a supernet should be consistent with those ranked by true performance, which is known as the order-preserving ability. In this work, we analyze the order-preserving ability on the whole search space (global) and a sub-space of top architectures (local), and empirically show that the local order-preserving for current two-stage NAS methods still need to be improved. To rectify this, we propose a novel concept of Supernet Shifting, a refined search strategy combining architecture searching with supernet fine-tuning. Specifically, apart from evaluating, the training loss is also accumulated in searching and the supernet is updated every iteration. Since superior architectures are sampled more frequently in evolutionary searching, the supernet is encouraged to focus on top architectures, thus improving local order-preserving. Besides, a pre-trained supernet is often un-reusable for one-shot methods. We show that Supernet Shifting can fulfill transferring supernet to a new dataset. Specifically, the last classifier layer will be unset and trained through evolutionary searching. Comprehensive experiments show that our method has better order-preserving ability and can find a dominating architecture. Moreover, the pre-trained supernet can be easily transferred into a new dataset with no loss of performance.

Abstract (translated)

Supernet是一种在许多最近的新神经架构搜索（NAS）方法中作为核心组件的组件。它不仅有助于表示搜索空间，还提供了对最终候选架构的（相对）估计。因此，在超网络排名最高的设计师中，排名应该与真正的性能相一致，这被称为顺序保持能力。在这项工作中，我们分析了整个搜索空间（全局）和顶级架构子空间（局部）的顺序保持能力，并经验性地表明，当前的两阶段NAS方法中的局部顺序保持能力仍需要改进。为了纠正这个问题，我们提出了一个新的概念——超网络平移，一种将架构搜索与超网络微调相结合的优化搜索策略。具体来说，除了评估外，训练损失也在搜索过程中累积，并在每个迭代中更新超网络。由于在进化搜索中优质架构的采样更频繁，因此超网络被鼓励专注于顶级架构，从而提高局部顺序保持能力。此外，预训练的超网络通常无法用于一次性的方法。我们证明了超网络平移可以将预训练的超网络转移到新的数据集。具体来说，最后一个分类层将设置为空并通过对进化搜索进行训练来更新。全面的实验表明，我们的方法具有更好的顺序保持能力，可以找到主导架构。此外，预训练的超网络可以很容易地将转移到新的数据集，而不会损失性能。

URL

https://arxiv.org/abs/2403.11380

PDF

https://arxiv.org/pdf/2403.11380.pdf
Read All
Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

2024-03-15 15:47:54

Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai

arXiv_CV

arXiv_CV Image_Caption Segmentation Semantic_Segmentation CNN Attention NAS Transformer Autonomous
Abstract

Image segmentation is one of the most fundamental problems in computer vision and has drawn a lot of attentions due to its vast applications in image understanding and autonomous driving. However, designing effective and efficient segmentation neural architectures is a labor-intensive process that may require lots of trials by human experts. In this paper, we address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently, by leveraging architecture search. Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution. By contrast, we develop a multi-target multi-branch supernet method, which not only fully utilizes the advantages of high-resolution features, but also finds the proper location for placing multi-head self-attention module. Our search algorithm is optimized towards multiple objective s (e.g., latency and mIoU) and capable of finding architectures on Pareto frontier with arbitrary number of branches in a single search. We further present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers between branches from different resolutions and fuse to high resolution for both efficiency and effectiveness. Extensive experiments demonstrate that HyCTAS outperforms previous methods on semantic segmentation task. Code and models are available at \url{this https URL}.

Abstract (translated)

图像分割是计算机视觉中最基本的问题之一，由于其在图像理解和自动驾驶中的广泛应用，因此受到了很多关注。然而，设计有效且高效的分割神经架构是一个劳动密集的过程，可能需要许多专家的人工作尝试。在本文中，我们通过利用架构搜索解决了将多头自注意力集成到高分辨率表示CNNs中的问题，通过构建多目标多分支超级网络。通过手动替换卷积层为多头自注意力，由于需要高昂的内存开销来维持高分辨率，因此这是不可能的。相反，我们开发了一种多目标多分支超级网络方法，不仅充分利用了高分辨率特征的优势，而且发现了放置多头自注意力的适当位置。我们的搜索算法针对多个目标（如延迟和mIoU）进行优化，可以在一个搜索中找到架构在帕累托前沿的任意数量分支上的最优架构。我们还通过HyCTAS方法展示了一系列模型，该方法在寻找不同分辨率分支的最佳轻量级卷积层和内存高效的自注意力层之间进行了搜索，将高分辨率与效率进行了平衡。大量实验证明，HyCTAS在语义分割任务上优于以前的算法。代码和模型可在此处访问：\url{这个链接}。

URL

https://arxiv.org/abs/2403.10413

PDF

https://arxiv.org/pdf/2403.10413.pdf
Read All
SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages

2024-03-14 12:07:37

René Groh, Nina Goes, Andreas M. Kist

arXiv_SD

arXiv_SD Speech_Recognition Recognition Deep_Learning Classification NAS Speech
Abstract

Benchmarking plays a pivotal role in assessing and enhancing the performance of compact deep learning models designed for execution on resource-constrained devices, such as microcontrollers. Our study introduces a novel, entirely artificially generated benchmarking dataset tailored for speech recognition, representing a core challenge in the field of tiny deep learning. SpokeN-100 consists of spoken numbers from 0 to 99 spoken by 32 different speakers in four different languages, namely English, Mandarin, German and French, resulting in 12,800 audio samples. We determine auditory features and use UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) as a dimensionality reduction method to show the diversity and richness of the dataset. To highlight the use case of the dataset, we introduce two benchmark tasks: given an audio sample, classify (i) the used language and/or (ii) the spoken number. We optimized state-of-the-art deep neural networks and performed an evolutionary neural architecture search to find tiny architectures optimized for the 32-bit ARM Cortex-M4 nRF52840 microcontroller. Our results represent the first benchmark data achieved for SpokeN-100.

Abstract (translated)

基准测试在评估和增强资源受限设备上设计的紧凑型深度学习模型的性能中发挥着重要作用，例如微控制器。我们的研究介绍了一个新的、完全人工生成的基准测试数据集，专门针对语音识别，代表了该领域中最小的深度学习挑战。SpokeN-100 包括来自 0 到 99 的语音数字，由 32 名不同的说话者用英语、普通话、德语和法语讲述了，共产生 12,800 个音频样本。我们确定音频特征，并使用 UMAP（统一曼哈顿近似和投影降维）作为降维方法，以展示数据集的多样性和丰富性。为了突出该数据集的使用案例，我们引入了两个基准任务：给定一个音频样本，分类（i）使用的语言，（ii）说话的数字。我们优化了最先进的深度神经网络，并进行了进化神经架构搜索，以找到针对 32 位 ARM Cortex-M4 nRF52840 微控制器的最佳架构。我们的结果代表了 SpokeN-100 第一个基准数据。

URL

https://arxiv.org/abs/2403.09753

PDF

https://arxiv.org/pdf/2403.09753.pdf
Read All
Random Search as a Baseline for Sparse Neural Network Architecture Search

2024-03-13 05:32:13

Rezsa Farahani

arXiv_AI

arXiv_AI Sparse NAS
Abstract

Sparse neural networks have shown similar or better generalization performance than their dense counterparts while having higher parameter efficiency. This has motivated a number of works to learn, induce, or search for high performing sparse networks. While reports of quality or efficiency gains are impressive, standard baselines are lacking, therefore hindering having reliable comparability and reproducibility across methods. In this work, we provide an evaluation approach and a naive Random Search baseline method for finding good sparse configurations. We apply Random Search on the node space of an overparameterized network with the goal of finding better initialized sparse sub-networks that are positioned more advantageously in the loss landscape. We record sparse network post-training performances at various levels of sparsity and compare against both their fully connected parent networks and random sparse configurations at the same sparsity levels. We observe that for this architecture search task, initialized sparse networks found by Random Search neither perform better nor converge more efficiently than their random counterparts. Thus we conclude that Random Search may be viewed as a suitable neutral baseline for sparsity search methods.

Abstract (translated)

稀疏神经网络在具有较高参数效率的同时显示出与密集神经网络相似或更好的泛化性能。这促使了许多工作学习、诱导或搜索高性能稀疏网络。尽管提高质量或效率的报告令人印象深刻，但标准基线却是匮乏的，因此阻碍了在方法间进行可靠比较和重复。在本文中，我们提供了评估方法和一个简单的随机搜索基线方法，用于寻找好的稀疏配置。我们在过参数网络的节点空间上应用随机搜索，旨在找到更优的初始稀疏子网络，使其在损失函数地形上更具有优势。我们记录了各种稀疏网络的训练后表现，并将它们与同一稀疏水平上的完全连接 parent 网络和随机稀疏配置进行比较。我们观察到，对于这种架构搜索任务，由 Random Search 找到的初始稀疏网络既没有表现得更好，也没有更有效地收敛。因此，我们得出结论，随机搜索可以被视为一种合适的稀疏搜索方法的合适中性基线。

URL

https://arxiv.org/abs/2403.08265

PDF

https://arxiv.org/pdf/2403.08265.pdf
Read All
Multi-conditioned Graph Diffusion for Neural Architecture Search

2024-03-09 21:45:31

Rohan Asthana, Joschua Conrad, Youssef Dawoud, Maurits Ortmanns, Vasileios Belagiannis

arXiv_CV

arXiv_CV NAS Pose Diffusion
Abstract

Neural architecture search automates the design of neural network architectures usually by exploring a large and thus complex architecture search space. To advance the architecture search, we present a graph diffusion-based NAS approach that uses discrete conditional graph diffusion processes to generate high-performing neural network architectures. We then propose a multi-conditioned classifier-free guidance approach applied to graph diffusion networks to jointly impose constraints such as high accuracy and low hardware latency. Unlike the related work, our method is completely differentiable and requires only a single model training. In our evaluations, we show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed, i.e. less than 0.2 seconds per architecture. Furthermore, we demonstrate the generalisability and efficiency of our method through experiments on ImageNet dataset.

Abstract (translated)

神经架构搜索通过探索大型且因此复杂的架构搜索空间，自动设计神经网络架构。为了提高架构搜索，我们提出了一个基于离散条件图扩散过程的NAS方法，该方法使用离散条件图扩散过程生成高性能的神经网络架构。然后，我们提出了一种应用于图扩散网络的多条件分类器-无关指导方法，以共同约束高准确性和低硬件延迟。与相关研究不同，我们的方法完全不同可导，只需要一个模型训练。在我们的评估中，我们在六个标准基准上展示了良好的结果，从而在不到0.2秒的时间内产生了新的和独特的架构。此外，我们通过在ImageNet数据集上的实验证明了我们的方法的泛化能力和效率。

URL

https://arxiv.org/abs/2403.06020

PDF

https://arxiv.org/pdf/2403.06020.pdf
Read All
ECToNAS: Evolutionary Cross-Topology Neural Architecture Search

2024-03-08 07:36:46

Elisabeth J. Schiessler, Roland C. Aydin, Christian J. Cyron

arXiv_CV

arXiv_CV CNN NAS
Abstract

We present ECToNAS, a cost-efficient evolutionary cross-topology neural architecture search algorithm that does not require any pre-trained meta controllers. Our framework is able to select suitable network architectures for different tasks and hyperparameter settings, independently performing cross-topology optimisation where required. It is a hybrid approach that fuses training and topology optimisation together into one lightweight, resource-friendly process. We demonstrate the validity and power of this approach with six standard data sets (CIFAR-10, CIFAR-100, EuroSAT, Fashion MNIST, MNIST, SVHN), showcasing the algorithm's ability to not only optimise the topology within an architectural type, but also to dynamically add and remove convolutional cells when and where required, thus crossing boundaries between different network types. This enables researchers without a background in machine learning to make use of appropriate model types and topologies and to apply machine learning methods in their domains, with a computationally cheap, easy-to-use cross-topology neural architecture search framework that fully encapsulates the topology optimisation within the training process.

Abstract (translated)

我们提出了ECToNAS，一种成本效益高的进化交叉 topology 神经架构搜索算法，它不需要任何预训练的元控制器。我们的框架能够独立执行跨 topology 优化，在需要时提供。这是一种结合训练和 topology 优化的轻量级、资源友好的过程。我们用六个标准数据集（CIFAR-10，CIFAR-100，EuroSAT，Fashion MNIST，MNIST，SVHN）证明了这种方法的有效性和威力，展示了算法不仅可以在架构类型的 topology 上进行优化，还可以在需要时动态添加和删除卷积细胞，从而跨越不同网络类型的边界。这使得没有机器学习背景的研究人员可以使用适当的模型类型和 topology，并在其领域应用机器学习方法。我们提供了一个计算上便宜、易于使用的跨 topology 神经架构搜索框架，完全封装了 topology 优化在训练过程中。

URL

https://arxiv.org/abs/2403.05123

PDF

https://arxiv.org/pdf/2403.05123.pdf
Read All
Unsupervised Graph Neural Architecture Search with Disentangled Self-supervision

2024-03-08 05:23:55

Zeyang Zhang, Xin Wang, Ziwei Zhang, Guangyao Shen, Shiqi Shen, Wenwu Zhu

arXiv_AI

arXiv_AI Relation NAS Unsupervised Pose Self-Supervised
Abstract

The existing graph neural architecture search (GNAS) methods heavily rely on supervised labels during the search process, failing to handle ubiquitous scenarios where supervisions are not available. In this paper, we study the problem of unsupervised graph neural architecture search, which remains unexplored in the literature. The key problem is to discover the latent graph factors that drive the formation of graph data as well as the underlying relations between the factors and the optimal neural architectures. Handling this problem is challenging given that the latent graph factors together with architectures are highly entangled due to the nature of the graph and the complexity of the neural architecture search process. To address the challenge, we propose a novel Disentangled Self-supervised Graph Neural Architecture Search (DSGAS) model, which is able to discover the optimal architectures capturing various latent graph factors in a self-supervised fashion based on unlabeled graph data. Specifically, we first design a disentangled graph super-network capable of incorporating multiple architectures with factor-wise disentanglement, which are optimized simultaneously. Then, we estimate the performance of architectures under different factors by our proposed self-supervised training with joint architecture-graph disentanglement. Finally, we propose a contrastive search with architecture augmentations to discover architectures with factor-specific expertise. Extensive experiments on 11 real-world datasets demonstrate that the proposed model is able to achieve state-of-the-art performance against several baseline methods in an unsupervised manner.

Abstract (translated)

现有的图神经网络架构搜索（GNAS）方法在搜索过程中严重依赖有监督标签，在无法获得监督时无法处理普遍场景。在本文中，我们研究了无监督图神经网络架构搜索问题，该问题在文献中尚未被深入研究。关键问题是要发现推动图数据形成的关键隐式图因素以及因素之间的底层关系。由于图的性质和神经网络架构搜索过程的复杂性，解决这个问题具有挑战性。为了解决这个问题，我们提出了一个新颖的剥离式自监督图神经网络架构搜索（DSGAS）模型，它能够以自监督的方式基于无监督图数据发现最优架构，捕捉各种隐式图因素。具体来说，我们首先设计了一个剥离的图超网络，具有多个架构因子水平的剥离，这些因子同时优化。然后，我们通过自监督训练与架构-图剥离估计架构的性能。最后，我们提出了一个对比性搜索与架构增强来发现具有特定专业知识因素的架构。在11个真实世界数据集上进行广泛的实验证明，与几个基线方法相比，该模型可以在无需监督的情况下实现最先进的性能。

URL

https://arxiv.org/abs/2403.05064

PDF

https://arxiv.org/pdf/2403.05064.pdf
Read All
SWAP-NAS: Sample-Wise Activation Patterns For Ultra-Fast NAS

2024-03-07 02:40:42

Yameng Peng, Andy Song, Haytham M. Fayek, Vic Ciesielski, Xiaojun Chang

arXiv_CV

arXiv_CV Relation NAS Pose
Abstract

Training-free metrics (a.k.a. zero-cost proxies) are widely used to avoid resource-intensive neural network training, especially in Neural Architecture Search (NAS). Recent studies show that existing training-free metrics have several limitations, such as limited correlation and poor generalisation across different search spaces and tasks. Hence, we propose Sample-Wise Activation Patterns and its derivative, SWAP-Score, a novel high-performance training-free metric. It measures the expressivity of networks over a batch of input samples. The SWAP-Score is strongly correlated with ground-truth performance across various search spaces and tasks, outperforming 15 existing training-free metrics on NAS-Bench-101/201/301 and TransNAS-Bench-101. The SWAP-Score can be further enhanced by regularisation, which leads to even higher correlations in cell-based search space and enables model size control during the search. For example, Spearman's rank correlation coefficient between regularised SWAP-Score and CIFAR-100 validation accuracies on NAS-Bench-201 networks is 0.90, significantly higher than 0.80 from the second-best metric, NWOT. When integrated with an evolutionary algorithm for NAS, our SWAP-NAS achieves competitive performance on CIFAR-10 and ImageNet in approximately 6 minutes and 9 minutes of GPU time respectively.

Abstract (translated)

无训练指标（也称为零成本代理）广泛用于避免资源密集型神经网络训练，特别是在神经架构搜索（NAS）中。最近的研究表明，现有的训练指标具有多个局限性，如在不同搜索空间和任务上的相关性有限和泛化性能差。因此，我们提出了样本加权激活模式及其导数（SWAP-Score），一种新颖的高性能训练指标。它衡量网络在批输入样本上的表现力。SWAP-Score在各种搜索空间和任务上的地面真值性能上高度相关，在NAS-Bench-101/201/301和TransNAS-Bench-101上优于15个现有训练指标。通过正则化可以进一步增强SWAP-Score，从而在基于细胞的搜索空间中实现更高的相关性，并在搜索过程中实现模型大小的控制。例如，在NAS-Bench-201网络上的正常斯皮尔曼秩相关系数 between 经过正则化的SWAP-Score和CIFAR-100验证准确率之间的比值约为0.90，比第二好的指标NWOT高出约0.80。当与进化算法集成时，我们的SWAP-NAS在分别大约6分钟和9分钟的GPU时间内在CIFAR-10和ImageNet上实现竞争力的性能。

URL

https://arxiv.org/abs/2403.04161

PDF

https://arxiv.org/pdf/2403.04161.pdf
Read All
Neural Architecture Search using Particle Swarm and Ant Colony Optimization

2024-03-06 15:23:26

Séamus Lankford, Diarmuid Grimes

arXiv_AI

arXiv_AI CNN Classification Transfer_Learning Optimization NAS
Abstract

Neural network models have a number of hyperparameters that must be chosen along with their architecture. This can be a heavy burden on a novice user, choosing which architecture and what values to assign to parameters. In most cases, default hyperparameters and architectures are used. Significant improvements to model accuracy can be achieved through the evaluation of multiple architectures. A process known as Neural Architecture Search (NAS) may be applied to automatically evaluate a large number of such architectures. A system integrating open source tools for Neural Architecture Search (OpenNAS), in the classification of images, has been developed as part of this research. OpenNAS takes any dataset of grayscale, or RBG images, and generates Convolutional Neural Network (CNN) architectures based on a range of metaheuristics using either an AutoKeras, a transfer learning or a Swarm Intelligence (SI) approach. Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) are used as the SI algorithms. Furthermore, models developed through such metaheuristics may be combined using stacking ensembles. In the context of this paper, we focus on training and optimizing CNNs using the Swarm Intelligence (SI) components of OpenNAS. Two major types of SI algorithms, namely PSO and ACO, are compared to see which is more effective in generating higher model accuracies. It is shown, with our experimental design, that the PSO algorithm performs better than ACO. The performance improvement of PSO is most notable with a more complex dataset. As a baseline, the performance of fine-tuned pre-trained models is also evaluated.

Abstract (translated)

神经网络模型有许多超参数需要选择，并与其架构一起选择。这可能对新手用户来说是一个沉重的负担，因为他们必须选择架构和为参数分配值。在大多数情况下，使用默认的超参数和架构即可。通过评估多种架构，可以实现模型的显著提高准确性。一种名为神经架构搜索（NAS）的过程可用于自动评估大量这样的架构。在这项研究中，还开发了一个集成开源工具进行神经架构搜索（OpenNAS）的系统，用于对图像进行分类。OpenNAS基于一系列元启发式方法生成灰度或RGB图像的卷积神经网络（CNN）架构。PSO和Ant Colony Optimization（ACO）作为SI算法使用。此外，通过这样的元启发式方法开发的模型可以使用堆叠增强集进行组合。在本文中，我们关注使用OpenNAS中的Swarm Intelligence（SI）组件训练和优化CNN。本文比较了PSO和ACO这两种SI算法，以确定哪种算法在生成较高模型准确性方面更有效。实验结果表明，在我们的实验设计中，PSO算法表现更好。PSO的性能改善最为明显，尤其是在具有更复杂数据集的情况下。作为 baseline，还评估了预训练模型的性能。

URL

https://arxiv.org/abs/2403.03781

PDF

https://arxiv.org/pdf/2403.03781.pdf
Read All
Encodings for Prediction-based Neural Architecture Search

2024-03-04 21:05:52

Yash Akhauri, Mohamed S. Abdelfattah

arXiv_CV

arXiv_CV Transfer_Learning Prediction Optimization NAS Unsupervised
Abstract

Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adjacency matrix describing the graph structure of a neural network, novel encodings embrace a variety of approaches from unsupervised pretraining of latent representations to vectors of zero-cost proxies. In this paper, we categorize and investigate neural encodings from three main types: structural, learned, and score-based. Furthermore, we extend these encodings and introduce \textit{unified encodings}, that extend NAS predictors to multiple search spaces. Our analysis draws from experiments conducted on over 1.5 million neural network architectures on NAS spaces such as NASBench-101 (NB101), NB201, NB301, Network Design Spaces (NDS), and TransNASBench-101. Building on our study, we present our predictor \textbf{FLAN}: \textbf{Fl}ow \textbf{A}ttention for \textbf{N}AS. FLAN integrates critical insights on predictor design, transfer learning, and \textit{unified encodings} to enable more than an order of magnitude cost reduction for training NAS accuracy predictors. Our implementation and encodings for all neural networks are open-sourced at \href{this https URL}{this https URL\_nas}.

Abstract (translated)

基于预测器的神经网络架构搜索（NAS）优化方法已经极大地增强了NAS。这些预测器的有效性很大程度上取决于编码神经网络架构的方法。虽然传统的编码方法使用邻接矩阵描述神经网络的图形结构，而新的编码方法则采用各种无监督预训练、零成本代理的方案，从神经网络的图结构编码到向量表示。在本文中，我们将分类并研究三种主要的神经编码：结构、学习到的和基于分数的。此外，我们将这些编码扩展到多个搜索空间，引入了统一编码，将NAS预测器扩展到多个搜索空间。我们的分析基于在NAS空间上超过1500万神经网络架构的实验，如NASBench-101（NB101）、NB201、NB301、网络设计空间（NDS）和TransNASBench-101。基于我们的研究，我们提出了预测器FLAN：FLow Attention for NAS。FLAN集成了关于预测器设计、迁移学习和统一编码的关键见解，以实现训练NAS准确度预测器超过一倍的成本降低。我们的实现和对所有神经网络的编码都是开源的，您可以点击以下链接访问：<https://this https URL>this https URL_nas。

URL

https://arxiv.org/abs/2403.02484

PDF

https://arxiv.org/pdf/2403.02484.pdf
Read All
On Latency Predictors for Neural Architecture Search

2024-03-04 19:59:32

Yash Akhauri, Mohamed S. Abdelfattah

arXiv_CV

arXiv_CV Transfer_Learning Prediction Optimization NAS Transformer Pose
Abstract

Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a specific hardware device. Central to these search algorithms is a prediction model that is designed to provide a hardware latency estimate for a candidate NN architecture. Recent research has shown that the sample efficiency of these predictive models can be greatly improved through pre-training on some \textit{training} devices with many samples, and then transferring the predictor on the \textit{test} (target) device. Transfer learning and meta-learning methods have been used for this, but often exhibit significant performance variability. Additionally, the evaluation of existing latency predictors has been largely done on hand-crafted training/test device sets, making it difficult to ascertain design features that compose a robust and general latency predictor. To address these issues, we introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets. We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes. Building on conclusions from our study, we present an end-to-end latency predictor training strategy that outperforms existing methods on 11 out of 12 difficult latency prediction tasks, improving latency prediction by 22.5\% on average, and up to to 87.6\% on the hardest tasks. Focusing on latency prediction, our HW-Aware NAS reports a $5.8\times$ speedup in wall-clock time. Our code is available on \href{this https URL}{this https URL\_latency}.

Abstract (translated)

高效部署神经网络（NN）需要准确性和延迟的协同优化。例如，通过在特定硬件设备上自动找到满足延迟约束的NN架构，硬件感知神经架构搜索方法已经被用于找到满足硬件设备上延迟约束的NN架构。这些搜索算法的核心是一个预测模型，它被设计为一个候选NN架构的硬件延迟估计。最近的研究表明，通过在某些具有大量样本的训练设备上进行预训练，然后将预测器转移到测试（目标）设备上，可以大大提高这些预测模型的样本效率。为了实现这一目标，我们通过基于原理的硬件设备集的自动划分引入了一个完整的延迟预测任务集。然后，我们设计了一个通用的延迟预测器，以全面研究（1）预测器架构，（2）NN样本选择方法，（3）硬件设备表示，（4）NN操作编码方案。基于我们研究的结果，我们提出了一个端到端的延迟预测器训练策略，在12个具有挑战性的延迟预测任务中表现优于现有方法，将平均延迟预测提高22.5%，而在最困难的任务上，性能提高至87.6%。关注延迟预测，我们的硬件感知NAS在墙时时间上报告了5.8倍的提速。我们的代码可在此处访问：<https://this https://this https://this URL>

URL

https://arxiv.org/abs/2403.02446

PDF

https://arxiv.org/pdf/2403.02446.pdf
Read All
Revisiting Learning-based Video Motion Magnification for Real-time Processing

2024-03-04 09:57:08

Hyunwoo Ha, Oh Hyun-Bin, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh

arXiv_CV

arXiv_CV Deep_Learning NAS
Abstract

Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online applications. In this paper, we investigate an efficient deep learning-based motion magnification model that runs in real time for full-HD resolution videos. Due to the specified network design of the prior art, i.e. inhomogeneous architecture, the direct application of existing neural architecture search methods is complicated. Instead of automatic search, we carefully investigate the architecture module by module for its role and importance in the motion magnification task. Two key findings are 1) Reducing the spatial resolution of the latent motion representation in the decoder provides a good trade-off between computational efficiency and task quality, and 2) surprisingly, only a single linear layer and a single branch in the encoder are sufficient for the motion magnification task. Based on these findings, we introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art while maintaining comparable quality.

Abstract (translated)

视频运动放大是一种通过捕捉和放大肉眼看不见的视频中的微妙运动来捕获和放大的技术。基于深度学习的先驱工作在保持与传统信号处理方法相比具有卓越的质量的同时，成功建模了运动放大问题。然而，它仍然落后于实时性能，无法扩展到各种在线应用程序。在本文中，我们研究了一个在高清分辨率视频上运行的实时深度学习运动放大模型。由于先前的网络设计是异构的，因此直接应用现有的神经网络架构搜索方法会非常复杂。我们不是通过自动搜索，而是仔细研究每个模块的架构模块，以了解其在运动放大任务中的作用和重要性。两个关键发现是 1) 降低编码器中潜在运动表示的空间分辨率可以实现计算效率和任务质量之间的良好平衡，以及 2) 令人惊讶的是，仅需要一个线性层和一个分支的编码器就可以完成运动放大任务。基于这些发现，我们引入了一个具有4.2X fewer FLOPs 和比先前技术快2.7X 的实时深度学习运动放大模型，同时保持相当的质量。

URL

https://arxiv.org/abs/2403.01898

PDF

https://arxiv.org/pdf/2403.01898.pdf
Read All

Content

NAS (20)

NAS

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF