Paper Reading AI Learner

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

2026-03-02 17:59:05
Yiying Yang, Wei Cheng, Sijin Chen, Honghao Fu, Xianfang Zeng, Yujun Cai, Gang Yu, Xingjun Ma

Abstract

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.

Abstract (translated)

OmniLottie 是一个多功能框架,它能够根据多模态指令生成高质量的矢量动画。为了实现灵活的动作和视觉内容控制,我们专注于 Lottie——这是一种轻量级的 JSON 格式化方式,用于表示形状及动画行为。然而,原始的 Lottie JSON 文件包含大量的不变结构元数据和格式令牌,这为学习矢量动画生成带来了显著挑战。因此,我们引入了一个精心设计的 Lottie 词法分析器(tokenizer),它可以将 JSON 文件转换为一系列代表形状、动画函数以及控制参数的结构化命令序列。这种词法分析器使我们能够基于预先训练好的视觉语言模型构建 OmniLottie,并根据多模态交织指令生成高质量的矢量动画。 为了进一步推进矢量动画生成的研究,我们还整理了一个名为 MMLottie-2M 的大规模数据集,该数据集中包含专业设计的矢量动画及其文本和视觉注释。通过广泛的实验验证,我们确认 OmniLottie 能够根据多模态的人类指令生成生动且语义一致的矢量动画。

URL

https://arxiv.org/abs/2603.02138

PDF

https://arxiv.org/pdf/2603.02138.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot