Abstract
The modeling of industrial scenes is essential for simulations in industrial manufacturing. While large language models (LLMs) have shown significant progress in generating general 3D scenes from textual descriptions, generating industrial scenes with LLMs poses a unique challenge due to their demand for precise measurements and positioning, requiring complex planning over spatial arrangement. To address this challenge, we introduce SceneGenAgent, an LLM-based agent for generating industrial scenes through C# code. SceneGenAgent ensures precise layout planning through a structured and calculable format, layout verification, and iterative refinement to meet the quantitative requirements of industrial scenarios. Experiment results demonstrate that LLMs powered by SceneGenAgent exceed their original performance, reaching up to 81.0% success rate in real-world industrial scene generation tasks and effectively meeting most scene generation requirements. To further enhance accessibility, we construct SceneInstruct, a dataset designed for fine-tuning open-source LLMs to integrate into SceneGenAgent. Experiments show that fine-tuning open-source LLMs on SceneInstruct yields significant performance improvements, with Llama3.1-70B approaching the capabilities of GPT-4o. Our code and data are available at this https URL .
Abstract (translated)
工业场景的建模对于工业制造中的模拟至关重要。虽然大型语言模型(LLMs)在根据文本描述生成通用的3D场景方面已经取得了显著进展,但使用LLMs生成工业场景却面临着独特的挑战,这主要是由于其对精确测量和定位的需求,需要进行复杂的空间布局规划。为了解决这一问题,我们引入了SceneGenAgent,这是一个基于LLM的代理程序,通过C#代码来生成工业场景。SceneGenAgent通过结构化且可计算的格式确保精准的布局规划,并通过布局验证及迭代优化来满足工业场景的定量要求。实验结果表明,由SceneGenAgent驱动的LLMs其性能超过了原有的水平,在真实世界中的工业场景生成任务中达到了高达81.0%的成功率,并有效地满足了大部分场景生成的要求。为了进一步提高可访问性,我们构建了SceneInstruct数据集,旨在对开源LLMs进行微调以集成到SceneGenAgent中。实验表明,在SceneInstruct上对开源LLMs进行微调能够显著提升性能,Llama3.1-70B接近达到了GPT-4o的能力水平。我们的代码和数据可以在这个 https URL 中获取。
URL
https://arxiv.org/abs/2410.21909