Abstract
The ability to anticipate the future is essential when making real time critical decisions, provides valuable information to understand dynamic natural scenes, and can help unsupervised video representation learning. State-of-art video prediction is based on LSTM recursive networks and/or generative adversarial network learning. These are complex architectures that need to learn large numbers of parameters, are potentially hard to train, slow to run, and may produce blurry predictions. In this paper, we introduce DYAN, a novel network with very few parameters and easy to train, which produces accurate, high quality frame predictions, significantly faster than previous approaches. DYAN owes its good qualities to its encoder and decoder, which are designed following concepts from systems identification theory and exploit the dynamics-based invariants of the data. Extensive experiments using several standard video datasets show that DYAN is superior generating frames and that it generalizes well across domains.
Abstract (translated)
在做出实时关键决策,提供有价值的信息来理解动态自然场景时,预测未来的能力是必不可少的,并且可以帮助无监督的视频表示学习。最先进的视频预测基于LSTM递归网络和/或生成性对抗性网络学习。这些是复杂的架构,需要学习大量参数,可能难以训练,运行缓慢,并且可能产生模糊的预测。在本文中,我们介绍DYAN,一种参数极少且易于训练的新型网络,可以产生准确,高质量的帧预测,比以前的方法快得多。 DYAN的优点在于其编码器和解码器,它们遵循系统识别理论的概念设计,并利用基于动力学的数据不变量。使用几个标准视频数据集的大量实验表明,DYAN是优越的生成帧,并且它跨域很好地推广。
URL
https://arxiv.org/abs/1803.07201