Abstract
Recent advances in Large Language Models (LLMs) have opened new perspectives for automation in optimization. While several studies have explored how LLMs can generate or solve optimization models, far less is understood about what these models actually learn regarding problem structure or algorithmic behavior. This study investigates how LLMs internally represent combinatorial optimization problems and whether such representations can support downstream decision tasks. We adopt a twofold methodology combining direct querying, which assesses LLM capacity to explicitly extract instance features, with probing analyses that examine whether such information is implicitly encoded within their hidden layers. The probing framework is further extended to a per-instance algorithm selection task, evaluating whether LLM-derived representations can predict the best-performing solver. Experiments span four benchmark problems and three instance representations. Results show that LLMs exhibit moderate ability to recover feature information from problem instances, either through direct querying or probing. Notably, the predictive power of LLM hidden-layer representations proves comparable to that achieved through traditional feature extraction, suggesting that LLMs capture meaningful structural information relevant to optimization performance.
Abstract (translated)
最近在大型语言模型(LLM)方面取得的进展为优化领域的自动化开辟了新的视角。尽管已有若干研究探讨了LLM生成或解决优化模型的能力,但对于这些模型实际学习到的问题结构或算法行为知之甚少。本研究旨在探究LLM如何内部表示组合优化问题,并评估这种表示是否能够支持下游决策任务。我们采用了一种双重方法论,结合直接查询和探测分析:前者用于评估LLM从实例中显式提取特征的能力;后者则考察这些信息是否隐含编码在其隐藏层中。进一步地,我们将探测框架扩展到了逐例算法选择任务上,以评估由LLM生成的表示能否预测最佳性能求解器。实验涵盖了四个基准问题和三种实例表示形式。结果表明,LLM表现出通过直接查询或探测分析从问题实例恢复特征信息的适度能力。值得注意的是,LLM隐藏层表示的预测力与传统特征提取方法所获得的效果相当,这表明LLM确实捕捉到了对优化性能具有重要意义的结构化信息。
URL
https://arxiv.org/abs/2512.13374