Abstract
Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc. This paper introduces an efficient video frame interpolation framework that aims to strike a favorable balance between efficiency and quality. Our framework follows a general paradigm consisting of a flow estimator and a refinement module, while incorporating carefully designed components. First of all, we adopt depth-wise convolution with large kernels in the flow estimator that simultaneously reduces the parameters and enhances the receptive field for encoding rich context and handling complex motion. Secondly, diverging from a common design for the refinement module with a UNet-structure (encoder-decoder structure), which we find redundant, our decoder-only refinement module directly enhances the result from coarse to fine features, offering a more efficient process. In addition, to address the challenge of handling high-definition frames, we also introduce an innovative HD-aware augmentation strategy during training, leading to consistent enhancement on HD images. Extensive experiments are conducted on diverse datasets, Vimeo90K, UCF101, Xiph and SNU-FILM. The results demonstrate that our approach achieves state-of-the-art performance with clear improvement while requiring much less FLOPs and parameters, reaching to a better spot for balancing efficiency and quality.
Abstract (translated)
视频帧插值(VFI)是各种应用(如慢动作生成、帧率转换、视频帧恢复等)中的关键技术。本文介绍了一种高效的视频帧插值框架,旨在在效率和质量之间取得良好的平衡。我们的框架包括一个流估计算法和一个优化模块,并精心设计了一些组件。首先,我们采用大尺寸的卷积来减少参数并增强编码丰富语境和处理复杂运动的能力。其次,从常见的优化模块设计(我们发现它是冗余的)中进行差异,我们的仅解码器优化模块直接增强从粗到细的特征,实现更高效的过程。此外,为了处理高清晰度帧,我们在训练过程中引入了一种创新的高清度增强策略,在HD图像上实现一致的增强。我们在多种数据集(Vimeo90K、UCF101、Xiph和SNU-FILM)上进行了广泛的实验。结果表明,我们的方法在具有显着提高的同时需要更少的FLOPs和参数,达到更好的平衡点,实现最高性能。
URL
https://arxiv.org/abs/2404.11108