Abstract
The performance of video prediction has been greatly boosted by advanced deep neural networks. However, most of the current methods suffer from large model sizes and require extra inputs, e.g., semantic/depth maps, for promising performance. For efficiency consideration, in this paper, we propose a Dynamic Multi-scale Voxel Flow Network (DMVFN) to achieve better video prediction performance at lower computational costs with only RGB images, than previous methods. The core of our DMVFN is a differentiable routing module that can effectively perceive the motion scales of video frames. Once trained, our DMVFN selects adaptive sub-networks for different inputs at the inference stage. Experiments on several benchmarks demonstrate that our DMVFN is an order of magnitude faster than Deep Voxel Flow and surpasses the state-of-the-art iterative-based OPT on generated image quality. Our code and demo are available at this https URL.
Abstract (translated)
视频预测的性能已经得到了高级深度学习网络的大大提高。然而,当前的方法大多数都面临着大型模型大小的问题,并需要额外的输入,例如语义/深度地图,以表现出良好的性能。为了考虑效率,在本文中,我们提出了一种动态多尺度 Voxel 流网络(DMVFN),可以在仅使用RGB图像的情况下,比先前方法实现更好的视频预测性能,而代价更低的计算成本。我们 DMVFN 的核心是一种可区分的路由模块,可以有效地感知视频帧的运动尺度。一旦训练完成,我们的 DMVFN 在推理阶段选择自适应子网络,以不同的输入。对多个基准测试对象的实验表明,我们的 DMVFN 比深度 Voxel 流更快,并且在生成图像质量方面超越了最先进的迭代基于优化方法。我们的代码和演示可以在这个 https URL 上找到。
URL
https://arxiv.org/abs/2303.09875