Paper Reading AI Learner

A Survey on Long Video Generation: Challenges, Methods, and Prospects

2024-03-25 03:47:53
Chengxuan Li, Di Huang, Zeyu Lu, Yang Xiao, Qingqi Pei, Lei Bai

Abstract

Video generation is a rapidly advancing research area, garnering significant attention due to its broad range of applications. One critical aspect of this field is the generation of long-duration videos, which presents unique challenges and opportunities. This paper presents the first survey of recent advancements in long video generation and summarises them into two key paradigms: divide and conquer temporal autoregressive. We delve into the common models employed in each paradigm, including aspects of network design and conditioning techniques. Furthermore, we offer a comprehensive overview and classification of the datasets and evaluation metrics which are crucial for advancing long video generation research. Concluding with a summary of existing studies, we also discuss the emerging challenges and future directions in this dynamic field. We hope that this survey will serve as an essential reference for researchers and practitioners in the realm of long video generation.

Abstract (translated)

视频生成是一个快速发展的研究领域,因其广泛的应用而受到广泛关注。这个领域的一个关键方面是生成长时视频,这带来了独特的挑战和机遇。本文是对近年来长视频生成研究的第一次调查,并将它们总结为两个关键范式:分而治之的时间自回归。我们深入研究每个范式中使用的常见模型,包括网络设计和调节技术。此外,我们提供了关于对长视频生成研究至关重要的数据集和评估指标的全面概述和分类。结论部分我们对现有研究进行总结,并讨论了该动态领域中新兴的挑战和未来的发展方向。我们希望这次调查能成为深入研究长视频生成的研究人员和实践者的有益参考。

URL

https://arxiv.org/abs/2403.16407

PDF

https://arxiv.org/pdf/2403.16407.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot