Abstract
Our project page: this https URL. Automated generation of complex, interactive indoor scenes tailored to user prompt remains a formidable challenge. While existing methods achieve indoor scene synthesis, they struggle with rigid editing constraints, physical incoherence, excessive human effort, single-room limitations, and suboptimal material quality. To address these limitations, we propose SceneLCM, an end-to-end framework that synergizes Large Language Model (LLM) for layout design with Latent Consistency Model(LCM) for scene optimization. Our approach decomposes scene generation into four modular pipelines: (1) Layout Generation. We employ LLM-guided 3D spatial reasoning to convert textual descriptions into parametric blueprints(3D layout). And an iterative programmatic validation mechanism iteratively refines layout parameters through LLM-mediated dialogue loops; (2) Furniture Generation. SceneLCM employs Consistency Trajectory Sampling(CTS), a consistency distillation sampling loss guided by LCM, to form fast, semantically rich, and high-quality representations. We also offer two theoretical justification to demonstrate that our CTS loss is equivalent to consistency loss and its distillation error is bounded by the truncation error of the Euler solver; (3) Environment Optimization. We use a multiresolution texture field to encode the appearance of the scene, and optimize via CTS loss. To maintain cross-geometric texture coherence, we introduce a normal-aware cross-attention decoder to predict RGB by cross-attending to the anchors locations in geometrically heterogeneous instance. (4)Physically Editing. SceneLCM supports physically editing by integrating physical simulation, achieved persistent physical realism. Extensive experiments validate SceneLCM's superiority over state-of-the-art techniques, showing its wide-ranging potential for diverse applications.
Abstract (translated)
我们的项目页面:此 HTTPS URL。自动生成复杂且互动性强的室内场景以响应用户提示仍然是一个重大挑战。虽然现有方法能够合成室内场景,但它们在编辑约束、物理不一致性、人力投入过大、仅限单个房间以及材料质量不佳等方面存在局限性。为了解决这些问题,我们提出了SceneLCM框架,该框架整合了大型语言模型(LLM)用于布局设计和潜在一致模型(LCM)用于场景优化的端到端方法。 我们的方法将场景生成分解成四个模块化管道: 1. **布局生成**:我们使用由LLM引导的3D空间推理技术,将文本描述转换为参数化的蓝图(即3D布局)。同时,通过LLM中介对话循环进行迭代验证机制,逐步细化布局参数。 2. **家具生成**:SceneLCM采用了一致性轨迹采样(CTS)方法,该方法由LCM指导的一致性蒸馏采样损失驱动,能够快速形成语义丰富且高质量的表示。此外,我们提供了两个理论依据来证明我们的CTS损失等同于一致性损失,并且其蒸馏误差被欧拉解算器截断误差所限定。 3. **环境优化**:我们使用多分辨率纹理字段编码场景外观,并通过CTS损失进行优化。为了保持跨几何结构的纹理连贯性,我们引入了一种法线感知交叉注意解码器,通过交叉关注不同实例中的锚定位置来预测RGB值。 4. **物理编辑支持**:SceneLCM通过整合物理模拟实现了持久的物理真实性,从而支持场景的物理编辑。广泛实验验证了SceneLCM在当前技术前沿上的优越性,并展示了其在各种应用领域的广泛应用潜力。
URL
https://arxiv.org/abs/2506.07091