Paper Reading AI Learner

DYAN: A Dynamical Atoms-Based Network for Video Prediction

2018-09-14 21:25:54
Wenqian Liu, Abhishek Sharma, Octavia Camps, Mario Sznaier

Abstract

The ability to anticipate the future is essential when making real time critical decisions, provides valuable information to understand dynamic natural scenes, and can help unsupervised video representation learning. State-of-art video prediction is based on LSTM recursive networks and/or generative adversarial network learning. These are complex architectures that need to learn large numbers of parameters, are potentially hard to train, slow to run, and may produce blurry predictions. In this paper, we introduce DYAN, a novel network with very few parameters and easy to train, which produces accurate, high quality frame predictions, significantly faster than previous approaches. DYAN owes its good qualities to its encoder and decoder, which are designed following concepts from systems identification theory and exploit the dynamics-based invariants of the data. Extensive experiments using several standard video datasets show that DYAN is superior generating frames and that it generalizes well across domains.

Abstract (translated)

在做出实时关键决策,提供有价值的信息来理解动态自然场景时,预测未来的能力是必不可少的,并且可以帮助无监督的视频表示学习。最先进的视频预测基于LSTM递归网络和/或生成性对抗性网络学习。这些是复杂的架构,需要学习大量参数,可能难以训练,运行缓慢,并且可能产生模糊的预测。在本文中,我们介绍DYAN,一种参数极少且易于训练的新型网络,可以产生准确,高质量的帧预测,比以前的方法快得多。 DYAN的优点在于其编码器和解码器,它们遵循系统识别理论的概念设计,并利用基于动力学的数据不变量。使用几个标准视频数据集的大量实验表明,DYAN是优越的生成帧,并且它跨域很好地推广。

URL

https://arxiv.org/abs/1803.07201

PDF

https://arxiv.org/pdf/1803.07201.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot