MMVP: Motion-Matrix-based Video Prediction

Abstract
Abstract (translated)
URL
PDF

Abstract

A central challenge of video prediction lies where the system has to reason the objects' future motions from image frames while simultaneously maintaining the consistency of their appearances across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. Unlike previous methods that usually handle motion prediction and appearance maintenance within the same set of modules, MMVP decouples motion and appearance information by constructing appearance-agnostic motion matrices. The motion matrices represent the temporal similarity of each and every pair of feature patches in the input frames, and are the sole input of the motion prediction module in MMVP. This design improves video prediction in both accuracy and efficiency, and reduces the model size. Results of extensive experiments demonstrate that MMVP outperforms state-of-the-art systems on public data sets by non-negligible large margins (about 1 db in PSNR, UCF Sports) in significantly smaller model sizes (84% the size or smaller). Please refer to this https URL for the official code and the datasets used in this paper.

Abstract (translated)

视频预测的一个核心挑战是在同时保持图像帧之间外观一致性的情况下,从图像帧中推理物体的未来运动。这项工作介绍了一种端到端可训练的双向视频预测框架,即运动矩阵视频预测(MMVP),以解决这个挑战。与以前的方法通常在同一模块内处理运动预测和外观维护不同,MMVP通过构建外观无关的运动矩阵将运动和外观信息分离。运动矩阵代表输入帧中每个特征 patch 之间的时间相似性,是MMVP 运动预测模块的唯一输入。这个设计提高了视频预测的准确性和效率,并减小了模型大小。广泛的实验结果表明,MMVP在公共数据集上比最先进的系统在小模型规模下显著表现更好(PSNR:约1 db,UCF Sports:84%大小或更小)。请参考此httpsURL以查看官方代码和本文使用的数据集。

URL

https://arxiv.org/abs/2308.16154

PDF

https://arxiv.org/pdf/2308.16154.pdf

MMVP: Motion-Matrix-based Video Prediction

Abstract

Abstract (translated)

URL

PDF Copy

PDF