Abstract
Video frame interpolation aims to synthesize nonexistent frames in-between the original frames. While significant advances have been made from the recent deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features to gather contextual information from neighboring pixels. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets.
Abstract (translated)
视频帧插值的目的是合成原始帧之间不存在的帧。虽然最近的深卷积神经网络取得了重大进展,但是由于大的物体运动或遮挡,插值的质量往往会降低。在这项工作中,我们提出了一种视频帧插值方法,通过探测深度信息来明确检测遮挡。具体地说,我们开发了一个深度感知的流投影层来合成中间流,这些中间流最好对距离较近的对象进行采样,而不是对距离较远的对象进行采样。此外,我们还学习了从相邻像素收集上下文信息的分层特征。然后,该模型基于光流和局部插值核对输入帧、深度图和上下文特征进行了变形,以合成输出帧。我们的模型紧凑,高效,完全可微。定量和定性结果表明,该模型在各种数据集上均优于最先进的帧内插方法。
URL
https://arxiv.org/abs/1904.00830