Abstract
Interactive 3D generation is gaining momentum and capturing extensive attention for its potential to create immersive virtual experiences. However, a critical challenge in current 3D generation technologies lies in achieving real-time interactivity. To address this issue, we introduce WonderTurbo, the first real-time interactive 3D scene generation framework capable of generating novel perspectives of 3D scenes within 0.72 seconds. Specifically, WonderTurbo accelerates both geometric and appearance modeling in 3D scene generation. In terms of geometry, we propose StepSplat, an innovative method that constructs efficient 3D geometric representations through dynamic updates, each taking only 0.26 seconds. Additionally, we design QuickDepth, a lightweight depth completion module that provides consistent depth input for StepSplat, further enhancing geometric accuracy. For appearance modeling, we develop FastPaint, a 2-steps diffusion model tailored for instant inpainting, which focuses on maintaining spatial appearance consistency. Experimental results demonstrate that WonderTurbo achieves a remarkable 15X speedup compared to baseline methods, while preserving excellent spatial consistency and delivering high-quality output.
Abstract (translated)
交互式3D生成技术正逐渐兴起,并因其创造沉浸式虚拟体验的潜力而备受关注。然而,当前3D生成技术的一个关键挑战在于实现实时互动性。为了解决这一问题,我们推出了WonderTurbo——首个能够实现实时交互式3D场景生成框架,它能够在0.72秒内生成新的3D场景视角。具体而言,WonderTurbo通过加速几何和外观建模来提高3D场景生成的速度。 在几何方面,我们提出了StepSplat方法,这是一种创新的动态更新技术,能够构建高效的3D几何表示,并且每次仅需花费0.26秒。此外,我们设计了QuickDepth轻量级深度完成模块,为StepSplat提供了一致性的深度输入,从而进一步提高几何精度。 在外观建模方面,我们开发了FastPaint——一个专为即时修复绘制而定制的两步扩散模型,专注于保持空间外观的一致性。实验结果表明,与基准方法相比,WonderTurbo实现了15倍的速度提升,并且能够维持出色的空间一致性并输出高质量的结果。
URL
https://arxiv.org/abs/2504.02261