Paper Reading AI Learner

LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field

2024-04-13 11:07:53
Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He

Abstract

Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation.

Abstract (translated)

电影图像是将静态照片和微动态运动的元素结合在一起,创造了一种引人入胜的体验。然而,由最近作品生成的多数视频缺乏深度信息,并局限于2D图像空间的限制。在本文中,我们受到3D高斯平滑(3D-GS)在 novel view synthesis(NVS)领域的重大进步的启发,提出了一种将电影图象从2D图像空间提升到3D空间使用3D高斯建模的方法。为了实现这一目标,我们首先使用3D-GS方法从静态场景的多视角图像中重构3D高斯点云,包括形状正则化项,以防止由于物体变形引起的模糊或伪影。然后我们采用一个专为3D高斯设计的自动编码器将其投影到特征空间。为了保持场景的局部连续性,我们根据获得的特征设计 SuperGaussian for clustering。通过计算聚类之间的相似度并使用双阶段估计方法,我们得到一个欧拉运动场,描述了场景中整个空间的瞬时速度。然后,通过双向动画技术,我们最终生成一个3D电影图象,展示了自然和无缝的循环动态。实验结果证实了我们的方法的有效性,表明了高质量和视觉上吸引人的场景生成。

URL

https://arxiv.org/abs/2404.08966

PDF

https://arxiv.org/pdf/2404.08966.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot