Abstract
OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.
Abstract (translated)
OmniLottie 是一个多功能框架,它能够根据多模态指令生成高质量的矢量动画。为了实现灵活的动作和视觉内容控制,我们专注于 Lottie——这是一种轻量级的 JSON 格式化方式,用于表示形状及动画行为。然而,原始的 Lottie JSON 文件包含大量的不变结构元数据和格式令牌,这为学习矢量动画生成带来了显著挑战。因此,我们引入了一个精心设计的 Lottie 词法分析器(tokenizer),它可以将 JSON 文件转换为一系列代表形状、动画函数以及控制参数的结构化命令序列。这种词法分析器使我们能够基于预先训练好的视觉语言模型构建 OmniLottie,并根据多模态交织指令生成高质量的矢量动画。 为了进一步推进矢量动画生成的研究,我们还整理了一个名为 MMLottie-2M 的大规模数据集,该数据集中包含专业设计的矢量动画及其文本和视觉注释。通过广泛的实验验证,我们确认 OmniLottie 能够根据多模态的人类指令生成生动且语义一致的矢量动画。
URL
https://arxiv.org/abs/2603.02138