Paper Reading AI Learner

ToonCrafter: Generative Cartoon Interpolation

2024-05-28 07:58:33
Jinbo Xing, Hanyuan Liu, Menghan Xia, Yong Zhang, Xintao Wang, Ying Shan, Tien-Tsin Wong

Abstract

We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors.

Abstract (translated)

我们提出了ToonCrafter,一种超越传统基于相似性建立的卡通视频插值方法,为生成插值铺平道路。传统方法往往暗示线性运动和缺乏复杂现象(如闭塞)的情况,通常在卡通中难以处理普遍存在的非线性和大运动与闭塞现象,导致不令人信服或甚至失败的结果。为了克服这些限制,我们探讨了在生成框架中将实时视频先验进行自适应调整以更好地适应卡通插值的可能性。ToonCrafter有效地解决了将实时视频运动先验应用于生成卡通插值时所面临的挑战。首先,我们设计了一个toon矩形修复学习策略,将实时视频先验无缝适应卡通领域,解决了领域差和内容泄漏问题。接下来,我们引入了一个基于双参考的3D解码器,以弥补高度压缩的潜在先验空间中丢失的细节,确保插值结果的精细节得以保留。最后,我们设计了一个灵活的插值编码器,使用户能够通过交互方式控制插值结果。实验结果表明,我们所提出的方法不仅产生了视觉上令人信服且更自然的变化,而且有效地处理了闭塞现象。比较评估表明,我们的方法在现有竞争者中具有显著优势。

URL

https://arxiv.org/abs/2405.17933

PDF

https://arxiv.org/pdf/2405.17933.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot