Abstract
We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors.
Abstract (translated)
我们提出了ToonCrafter,一种超越传统基于相似性建立的卡通视频插值方法,为生成插值铺平道路。传统方法往往暗示线性运动和缺乏复杂现象(如闭塞)的情况,通常在卡通中难以处理普遍存在的非线性和大运动与闭塞现象,导致不令人信服或甚至失败的结果。为了克服这些限制,我们探讨了在生成框架中将实时视频先验进行自适应调整以更好地适应卡通插值的可能性。ToonCrafter有效地解决了将实时视频运动先验应用于生成卡通插值时所面临的挑战。首先,我们设计了一个toon矩形修复学习策略,将实时视频先验无缝适应卡通领域,解决了领域差和内容泄漏问题。接下来,我们引入了一个基于双参考的3D解码器,以弥补高度压缩的潜在先验空间中丢失的细节,确保插值结果的精细节得以保留。最后,我们设计了一个灵活的插值编码器,使用户能够通过交互方式控制插值结果。实验结果表明,我们所提出的方法不仅产生了视觉上令人信服且更自然的变化,而且有效地处理了闭塞现象。比较评估表明,我们的方法在现有竞争者中具有显著优势。
URL
https://arxiv.org/abs/2405.17933