Abstract
We present FlexTraj, a framework for image-to-video generation with flexible point trajectory control. FlexTraj introduces a unified point-based motion representation that encodes each point with a segmentation ID, a temporally consistent trajectory ID, and an optional color channel for appearance cues, enabling both dense and sparse trajectory control. Instead of injecting trajectory conditions into the video generator through token concatenation or ControlNet, FlexTraj employs an efficient sequence-concatenation scheme that achieves faster convergence, stronger controllability, and more efficient inference, while maintaining robustness under unaligned conditions. To train such a unified point trajectory-controlled video generator, FlexTraj adopts an annealing training strategy that gradually reduces reliance on complete supervision and aligned condition. Experimental results demonstrate that FlexTraj enables multi-granularity, alignment-agnostic trajectory control for video generation, supporting various applications such as motion cloning, drag-based image-to-video, motion interpolation, camera redirection, flexible action control and mesh animations.
Abstract (translated)
我们介绍了FlexTraj,这是一个用于图像到视频生成的框架,并且具有灵活的点轨迹控制功能。FlexTraj 引入了一种统一的基于点的运动表示方法,该方法通过将每个点编码为分割ID、时间上一致的轨迹ID以及可选的颜色通道(用于外观线索),使密集和稀疏轨迹控制成为可能。 与通过标记拼接或ControlNet向视频生成器注入轨迹条件不同,FlexTraj 采用了一种高效的序列拼接方案,这使得收敛速度更快、可控性更强,并且推理更高效,同时在不对齐条件下也能保持鲁棒性。为了训练这种统一的点轨迹控制视频生成器,FlexTraj 采用了逐步减少对完整监督和对齐条件依赖的训练策略。 实验结果表明,FlexTraj 可以实现无对齐感知的多粒度轨迹控制用于视频生成,并支持多种应用,例如运动克隆、拖拽式图像到视频转换、运动插值、相机重新定向、灵活的动作控制以及网格动画。
URL
https://arxiv.org/abs/2510.08527