Paper Reading AI Learner

LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering

2025-06-03 10:50:20
Xiaoyi Feng, Kaifeng Zou, Caichun Cen, Tao Huang, Hui Guo, Zizhou Huang, Yingli Zhao, Mingqing Zhang, Diwei Wang, Yuntao Zou, Dagang Li

Abstract

Existing optical flow datasets focus primarily on real-world simulation or synthetic human motion, but few are tailored to Celluloid(cel) anime character motion: a domain with unique visual and motion characteristics. To bridge this gap and facilitate research in optical flow estimation and downstream tasks such as anime video generation and line drawing colorization, we introduce LinkTo-Anime, the first high-quality dataset specifically designed for cel anime character motion generated with 3D model rendering. LinkTo-Anime provides rich annotations including forward and backward optical flow, occlusion masks, and Mixamo Skeleton. The dataset comprises 395 video sequences, totally 24,230 training frames, 720 validation frames, and 4,320 test frames. Furthermore, a comprehensive benchmark is constructed with various optical flow estimation methods to analyze the shortcomings and limitations across multiple datasets.

Abstract (translated)

现有的光学流数据集主要关注现实世界的模拟或合成人体运动,但很少有数据集专门针对赛璐珞(cel)动画角色的运动:这是一个具有独特视觉和运动特征的领域。为了填补这一空白,并促进光学流估算以及下游任务如动漫视频生成和线条画着色的研究,我们引入了LinkTo-Anime,这是第一个专门为使用3D模型渲染的赛璐珞动画角色运动设计的高质量数据集。LinkTo-Anime提供了丰富的注释,包括前向和后向光学流、遮挡掩码以及Mixamo骨架信息。该数据集包含395个视频序列,总计24,230帧用于训练,720帧用于验证,以及4,320帧用于测试。此外,还构建了一个全面的基准测试平台,使用各种光学流估算方法来分析不同数据集中的不足和限制。

URL

https://arxiv.org/abs/2506.02733

PDF

https://arxiv.org/pdf/2506.02733.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot