MotionMaster: Training-free Camera Motion Transfer For Video Generation

Abstract
Abstract (translated)
URL
PDF

Abstract

The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control.

Abstract (translated)

扩散模型的出现极大地推动了图像和视频生成的进步。最近，在可控制视频生成方面，包括文本到视频生成和视频运动控制，已经做出了一些努力。其中，相机运动控制是一个重要的话题。然而，现有的相机运动控制方法依赖于训练一个时间相机模块，由于视频生成模型的巨大参数数量，需要大量的计算资源。此外，现有的方法在训练过程中预定义了相机运动类型，这限制了他们在相机控制方面的灵活性。因此，为了降低训练成本并实现灵活的相机控制，我们提出了COMD，一种新颖的训练-free视频运动传输模型，它解耦了源视频中的相机运动和物体运动，并将提取的相机运动传输到新的视频中。我们首先提出了一种单击相机运动解耦方法，从单个源视频中提取相机运动，将移动物体与背景分离，并根据背景中的运动在运动物体区域求解泊松方程。此外，我们还提出了一种几 shot相机运动解耦方法，从具有相似相机运动的多视频中提取共同的相机运动，采用基于窗口的聚类技术提取多个视频中的共同特征。最后，我们提出了一种运动组合方法，将不同类型的相机运动结合在一起，使我们的模型具有更可控制和灵活的相机控制。大量实验证明，我们的无训练方法可以有效地将相机-物体运动与可控制视频生成任务分开，将解耦后的相机运动应用到广泛的控制视频生成任务中，实现灵活和多样化的相机运动控制。

URL

https://arxiv.org/abs/2404.15789

PDF

https://arxiv.org/pdf/2404.15789.pdf

MotionMaster: Training-free Camera Motion Transfer For Video Generation

Abstract

Abstract (translated)

URL

PDF Copy

PDF