Abstract
Recent advances in large language models(LLMs) enable compelling story generation, but connecting narrative text to playable visual environments remains an open challenge in procedural content generation(PCG). We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes, reflecting the temporal structure of stories. Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates in the form of "Object-Relation-Object" triples, and retrieves visual assets using affordance-aware semantic embeddings from the GameTileNet dataset. A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure. We evaluated our system in ten diverse stories, analyzing tile-object matching, affordance-layer alignment, and spatial constraint satisfaction across frames. This prototype offers a scalable approach to narrative-driven scene generation and lays the foundation for future work on multi-frame continuity, symbolic tracking, and multi-agent coordination in story-centered PCG.
Abstract (translated)
最近在大型语言模型(LLMs)方面的进展使得生成引人入胜的故事成为可能,但将叙述文本与可玩的视觉环境联系起来仍然是程序化内容生成(PCG)中的一个开放性挑战。我们提出了一种轻量级流水线,能够将简短的故事提示转换为一系列2D基于瓷砖的游戏场景,这些场景反映了故事的时间结构。 给定由大型语言模型生成的叙述文本后,我们的系统会识别出三个关键时间框架,并提取空间谓词的形式化为“对象-关系-对象”三元组。然后使用带有GameTileNet数据集中感知功能的语义嵌入来检索视觉资产。地形通过细胞自动机分层生成,而根据谓词结构的空间规则放置物体。 我们在十种不同类型的故事情节中评估了该系统,分析了瓷砖与物体之间的匹配度、功能层次的一致性以及跨框架的空间约束满足情况。 这一原型提供了一种基于叙事驱动场景生成的可扩展方法,并为未来工作奠定了基础,这些未来的工作将涉及多帧连续性、符号跟踪和以故事为中心的多智能体协调。
URL
https://arxiv.org/abs/2509.04481