Cubic LSTMs for Video Prediction

Abstract
Abstract (translated)
URL
PDF

Abstract

Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities. The core of this problem involves moving object capture and future motion prediction. While object capture specifies which objects are moving in videos, motion prediction describes their future dynamics. Motivated by this analysis, we propose a Cubic Long Short-Term Memory (CubicLSTM) unit for video prediction. CubicLSTM consists of three branches, i.e., a spatial branch for capturing moving objects, a temporal branch for processing motions, and an output branch for combining the first two branches to generate predicted frames. Stacking multiple CubicLSTM units along the spatial branch and output branch, and then evolving along the temporal branch can form a cubic recurrent neural network (CubicRNN). Experiment shows that CubicRNN produces more accurate video predictions than prior methods on both synthetic and real-world datasets.

Abstract (translated)

预测视频中的未来帧已经成为计算机视觉和机器人学习领域的一个有前途的研究方向。该问题的核心是运动目标捕获和未来运动预测。虽然对象捕获指定哪些对象在视频中移动，但运动预测描述了它们的未来动态。基于这一分析，我们提出了一种用于视频预测的立方长短期存储器（cubiclstm）。CubiclsTM由三个分支组成，即用于捕获移动对象的空间分支、用于处理运动的时间分支和用于组合前两个分支以生成预测帧的输出分支。沿着空间分支和输出分支堆叠多个cubiclstm单元，然后沿时间分支进化，可以形成一个三次循环神经网络（cubicrnn）。实验表明，在合成数据集和真实数据集上，Cubicrnn比以前的方法产生更精确的视频预测。

URL

https://arxiv.org/abs/1904.09412

PDF

https://arxiv.org/pdf/1904.09412.pdf