Abstract
Automatic creation of 3D scenes for immersive VR presence has been a significant research focus for decades. However, existing methods often rely on either high-poly mesh modeling with post-hoc simplification or massive 3D Gaussians, resulting in a complex pipeline or limited visual realism. In this paper, we demonstrate that such exhaustive modeling is unnecessary for achieving compelling immersive experience. We introduce ImmerseGen, a novel agent-guided framework for compact and photorealistic world modeling. ImmerseGen represents scenes as hierarchical compositions of lightweight geometric proxies, i.e., simplified terrain and billboard meshes, and generates photorealistic appearance by synthesizing RGBA textures onto these proxies. Specifically, we propose terrain-conditioned texturing for user-centric base world synthesis, and RGBA asset texturing for midground and foreground this http URL reformulation offers several advantages: (i) it simplifies modeling by enabling agents to guide generative models in producing coherent textures that integrate seamlessly with the scene; (ii) it bypasses complex geometry creation and decimation by directly synthesizing photorealistic textures on proxies, preserving visual quality without degradation; (iii) it enables compact representations suitable for real-time rendering on mobile VR headsets. To automate scene creation from text prompts, we introduce VLM-based modeling agents enhanced with semantic grid-based analysis for improved spatial reasoning and accurate asset placement. ImmerseGen further enriches scenes with dynamic effects and ambient audio to support multisensory immersion. Experiments on scene generation and live VR showcases demonstrate that ImmerseGen achieves superior photorealism, spatial coherence and rendering efficiency compared to prior methods. Project webpage: this https URL.
Abstract (translated)
几十年来,自动创建用于沉浸式VR体验的3D场景一直是重要的研究焦点。然而,现有的方法通常依赖于高多边形网格建模后再进行简化处理或使用大量的三维高斯模型,这导致了复杂的流程或是有限的真实感视觉效果。在本文中,我们展示了为了实现令人信服的沉浸式体验,并不需要这种详尽的建模工作。我们引入了ImmerseGen,这是一个新的代理引导框架,用于紧凑且逼真的世界建模。 ImmerseGen将场景表示为轻量级几何代理(即简化的地形和海报网格)的层次组合,并通过合成RGBA纹理在这些代理上生成逼真的外观效果。具体而言,我们提出了基于地形条件的纹理处理方法来合成用户为中心的基本世界的组成元素,以及用于中景和前景元素的RGBA资产纹理化。这种重新构想提供了几个优势:(i) 它简化了建模过程,通过让代理指导生成模型生产与场景无缝集成的连贯纹理;(ii) 无需复杂的几何创建和减面处理,直接在代理上合成逼真的纹理可以保持视觉质量而不退化;(iii) 支持适合移动VR头显实时渲染的紧凑表示形式。 为了从文本提示中自动创建场景,我们引入了增强有语义网格分析功能的VLM建模代理,以改进空间推理和准确的资产放置。ImmerseGen进一步通过动态效果和环境音频来丰富场景,支持多感官沉浸体验。实验结果表明,在场景生成和现场VR演示中,与先前方法相比,ImmerseGen在逼真度、空间一致性以及渲染效率方面都表现出色。 项目网页:[请参阅原文提供的链接]
URL
https://arxiv.org/abs/2506.14315