Abstract
WonderPlay is a novel framework integrating physics simulation with video generation for generating action-conditioned dynamic 3D scenes from a single image. While prior works are restricted to rigid body or simple elastic dynamics, WonderPlay features a hybrid generative simulator to synthesize a wide range of 3D dynamics. The hybrid generative simulator first uses a physics solver to simulate coarse 3D dynamics, which subsequently conditions a video generator to produce a video with finer, more realistic motion. The generated video is then used to update the simulated dynamic 3D scene, closing the loop between the physics solver and the video generator. This approach enables intuitive user control to be combined with the accurate dynamics of physics-based simulators and the expressivity of diffusion-based video generators. Experimental results demonstrate that WonderPlay enables users to interact with various scenes of diverse content, including cloth, sand, snow, liquid, smoke, elastic, and rigid bodies -- all using a single image input. Code will be made public. Project website: this https URL
Abstract (translated)
WonderPlay 是一个新颖的框架,它将物理模拟与视频生成相结合,可以从单一图像中生成条件化的动态3D场景。虽然先前的工作仅限于刚体或简单的弹性动力学,但 WonderPlay 特别设计了一个混合生成式仿真器来合成各种各样的 3D 动力学。该混合生成式仿真器首先使用物理求解器模拟粗糙的 3D 动力学,然后利用视频生成器在这些基础之上产生更精细、更具现实感的动作视频。接着,所生成的视频被用来更新动态的 3D 场景,形成一个闭环过程,在其中物理求解器与视频生成器之间相互作用。这种方案使得用户能够直观地控制场景,并且结合了基于物理模拟器的精确动力学和扩散基础视频生成器的表现力。 实验结果表明,WonderPlay 允许用户通过单一图像输入与包含布料、沙子、雪、液体、烟雾、弹性体以及刚性物体等不同内容的各种场景进行互动。代码将会公开发布。项目网站:[此链接](https://this.url/)
URL
https://arxiv.org/abs/2505.18151