Abstract
Intent-Based Networking (IBN) allows operators to specify high-level network goals rather than low-level configurations. While recent work demonstrates that large language models can automate configuration tasks, a distinct class of intents requires generating optimization code to compute provably optimal solutions for traffic engineering, routing, and resource allocation. Current systems assume text-based intent expression, requiring operators to enumerate topologies and parameters in prose. Network practitioners naturally reason about structure through diagrams, yet whether Vision-Language Models (VLMs) can process annotated network sketches into correct optimization code remains unexplored. We present IntentOpt, a benchmark of 85 optimization problems across 17 categories, evaluating four VLMs (GPT-5-Mini, Claude-Haiku-4.5, Gemini-2.5-Flash, Llama-3.2-11B-Vision) under three prompting strategies on multimodal versus text-only inputs. Our evaluation shows that visual parameter extraction reduces execution success by 12-21 percentage points (pp), with GPT-5-Mini dropping from 93% to 72%. Program-of-thought prompting decreases performance by up to 13 pp, and open-source models lag behind closed-source ones, with Llama-3.2-11B-Vision reaching 18% compared to 75% for GPT-5-Mini. These results establish baseline capabilities and limitations of current VLMs for optimization code generation within an IBN system. We also demonstrate practical feasibility through a case study that deploys VLM-generated code to network testbed infrastructure using Model Context Protocol.
Abstract (translated)
基于意图的网络(IBN)允许操作员指定高层次的网络目标,而不是低层次的配置。尽管最近的研究表明大型语言模型可以自动化配置任务,但仍然有一类特定的意图需要生成优化代码来计算流量工程、路由和资源分配方面的最优解。当前系统假定使用文本表达意图,要求操作员通过文字描述拓扑结构和参数。网络从业人员自然会通过图表形式进行结构化推理,而视觉语言模型(VLMs)是否能够将标注的网络草图转换为正确的优化代码仍未被探索。 我们提出了一个名为IntentOpt的新基准测试,包含85个涵盖17类别的优化问题,并评估了四种不同的视觉语言模型(GPT-5-Mini、Claude-Haiku-4.5、Gemini-2.5-Flash和Llama-3.2-11B-Vision)在三种不同提示策略下的性能表现,这些模型分别处理多模态输入与文本单一输入。我们的评估结果显示,视觉参数提取会导致执行成功率降低12%至21%,GPT-5-Mini从93%下降到72%。程序思维方法的提示降低了最多13个百分点的表现。开源模型在性能上落后于闭源模型,Llama-3.2-11B-Vision达到18%,而GPT-5-Mini为75%。 这些结果确立了当前VLMs在基于意图网络系统中生成优化代码的基本能力和限制。此外,我们通过一个案例研究展示了其实用可行性,该研究部署了由VLM生成的代码到网络测试平台基础设施,并使用模型上下文协议进行管理。
URL
https://arxiv.org/abs/2601.12744