PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

Abstract
Abstract (translated)
URL
PDF

Abstract

In this paper, we investigate the challenge of spatio-temporal video prediction, which involves generating future videos based on historical data streams. Existing approaches typically utilize external information such as semantic maps to enhance video prediction, which often neglect the inherent physical knowledge embedded within videos. Furthermore, their high computational demands could impede their applications for high-resolution videos. To address these constraints, we introduce a novel approach called Physics-assisted Spatio-temporal Network (PastNet) for generating high-quality video predictions. The core of our PastNet lies in incorporating a spectral convolution operator in the Fourier domain, which efficiently introduces inductive biases from the underlying physical laws. Additionally, we employ a memory bank with the estimated intrinsic dimensionality to discretize local features during the processing of complex spatio-temporal signals, thereby reducing computational costs and facilitating efficient high-resolution video prediction. Extensive experiments on various widely-used datasets demonstrate the effectiveness and efficiency of the proposed PastNet compared with state-of-the-art methods, particularly in high-resolution scenarios.

Abstract (translated)

在本文中，我们研究时空视频预测的挑战，这涉及基于历史数据流生成未来视频的方法。现有的方法通常使用外部信息，如语义地图，以增强视频预测，但常常忽略了视频中隐含的物理知识。此外，它们的高度计算要求可能会阻碍其对高分辨率视频的应用。为了解决这些限制，我们介绍了一种新的方法，称为物理学辅助时空网络(PastNet)，以生成高质量的视频预测。我们的PastNet的核心是在傅里叶域中引入光谱卷积操作，有效地引入基于底层物理规律的转移偏见。此外，我们在处理复杂的时空信号时使用估计 intrinsic dimensionality 的内存银行来离散化局部特征，从而减少计算成本，并促进高效高分辨率视频预测。对多个广泛应用数据集进行广泛的实验表明， proposed pastNet 与最先进的方法相比，特别是在高分辨率场景下，其有效性和效率是有效的。

URL

https://arxiv.org/abs/2305.11421

PDF

https://arxiv.org/pdf/2305.11421.pdf

PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

Abstract

Abstract (translated)

URL

PDF Copy

PDF