Abstract
Channel configuration search the optimization of layer specifications such as layer widths in deep neural networks presents a complex combinatorial challenge constrained by tensor shape compatibility and computational budgets. We posit that Large Language Models (LLMs) offer a transformative approach to Neural Architecture Search (NAS), capable of reasoning about architectural code structure in ways that traditional heuristics cannot. In this paper, we investigate the application of an LLM-driven NAS framework to the problem of channel configuration. We formulate the search as a sequence of conditional code generation tasks, where an LLM refines architectural specifications based on performance telemetry. Crucially, we address the data scarcity problem by generating a vast corpus of valid, shape-consistent architectures via Abstract Syntax Tree (AST) mutations. While these mutated networks are not necessarily high-performing, they provide the critical volume of structural data required for the LLM to learn the latent relationship between channel configurations and model performance. This allows the LLM to internalize complex design patterns and apply them to optimize feature extraction strategies. Experimental results on CIFAR-100 validate the efficacy of this approach, demonstrating that the model yields statistically significant improvements in accuracy. Our analysis confirms that the LLM successfully acquires domain-specific architectural priors, distinguishing this method from random search and highlighting the immense potential of language-driven design in deep learning.
Abstract (translated)
深度神经网络中层规格(如层宽度)的优化构成了一个复杂的组合挑战,受到张量形状兼容性和计算预算的约束。我们认为大型语言模型 (LLM) 可以为神经架构搜索 (NAS) 提供一种变革性的方法,能够以传统启发式方法无法实现的方式推理关于架构代码结构的问题。在本文中,我们探讨了将基于 LLM 的 NAS 框架应用于通道配置问题的应用。我们将搜索形式化为一系列条件代码生成任务,在这些任务中,LLM 根据性能遥测数据来细化架构规范。关键地,我们通过抽象语法树(AST)变异生成大量有效且形状一致的架构来解决数据稀缺的问题。虽然这些变异网络未必是高性能的,但它们提供了 LLM 学习通道配置与模型表现之间潜在关系所需的关键结构数据量。这使得 LLM 能够内化复杂的设计模式,并将其应用于优化特征提取策略。 在 CIFAR-100 数据集上的实验结果验证了这种方法的有效性,证明该模型能够显著提高准确率(具有统计学意义)。我们的分析证实,LLM 成功地获得了特定领域的架构先验知识,这使本方法与随机搜索区分开来,并突显出语言驱动设计在深度学习中的巨大潜力。
URL
https://arxiv.org/abs/2601.08517