Abstract
We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (e.g., force and torque applied to an object in the image) to produce a realistic, physically plausible, and temporally consistent video. Our key insight is to integrate model-based physical simulation with a data-driven video generation process, enabling plausible image-space dynamics. At the heart of our system are three core components: (i) an image understanding module that effectively captures the geometry, materials, and physical parameters of the image; (ii) an image-space dynamics simulation model that utilizes rigid-body physics and inferred parameters to simulate realistic behaviors; and (iii) an image-based rendering and refinement module that leverages generative video diffusion to produce realistic video footage featuring the simulated motion. The resulting videos are realistic in both physics and appearance and are even precisely controllable, showcasing superior results over existing data-driven image-to-video generation works through quantitative comparison and comprehensive user study. PhysGen's resulting videos can be used for various downstream applications, such as turning an image into a realistic animation or allowing users to interact with the image and create various dynamics. Project page: this https URL
Abstract (translated)
我们提出了PhysGen,一种新颖的图像转视频生成方法,可以将单个图像和输入条件(例如作用于图像中物体的力和扭矩)转换为生产具有真实感、物理可信度和时间一致性的视频。我们的关键洞见是将基于模型的物理仿真与数据驱动的视频生成过程相结合,实现合理的图像空间动态。 我们系统的核心组件包括:(i)一个图像理解模块,有效捕捉图像的几何、材料和物理参数;(ii)一个图像空间动态仿真模型,利用刚体物理和推断参数模拟真实行为;(iii)一个基于生成视频扩散的图像基于渲染和精度的模块,利用生成的视频来制作具有模拟运动的视频 footage。 通过定量比较和全面用户研究,PhysGen生成的视频在物理和外观方面都是真实的,甚至可以精确控制,展示了通过定量比较和全面用户研究超过现有数据驱动图像到视频生成工作的优越性。PhysGen生成的视频可以用于各种下游应用,例如将图像转换为真实的动画,或者允许用户与图像交互并创建各种动态。项目页面:此链接
URL
https://arxiv.org/abs/2409.18964