Paper Reading AI Learner

Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features

2025-02-12 13:08:35
Liying Yang, Chen Liu, Zhenwei Zhu, Ajian Liu, Hui Ma, Jian Nong, Yanyan Liang

Abstract

Recently, the generation of dynamic 3D objects from a video has shown impressive results. Existing methods directly optimize Gaussians using whole information in frames. However, when dynamic regions are interwoven with static regions within frames, particularly if the static regions account for a large proportion, existing methods often overlook information in dynamic regions and are prone to overfitting on static regions. This leads to producing results with blurry textures. We consider that decoupling dynamic-static features to enhance dynamic representations can alleviate this issue. Thus, we propose a dynamic-static feature decoupling module (DSFD). Along temporal axes, it regards the portions of current frame features that possess significant differences relative to reference frame features as dynamic features. Conversely, the remaining parts are the static features. Then, we acquire decoupled features driven by dynamic features and current frame features. Moreover, to further enhance the dynamic representation of decoupled features from different viewpoints and ensure accurate motion prediction, we design a temporal-spatial similarity fusion module (TSSF). Along spatial axes, it adaptively selects a similar information of dynamic regions. Hinging on the above, we construct a novel approach, DS4D. Experimental results verify our method achieves state-of-the-art (SOTA) results in video-to-4D. In addition, the experiments on a real-world scenario dataset demonstrate its effectiveness on the 4D scene. Our code will be publicly available.

Abstract (translated)

最近,从视频生成动态3D对象取得了令人印象深刻的结果。现有方法直接使用帧中所有信息来优化高斯分布。然而,当动态区域与静态区域交织在一起,特别是如果静态区域占较大比例时,现有的方法往往忽视了动态区域中的信息,并且容易在静态区域过度拟合。这导致生成结果出现模糊纹理的问题。我们认为,分离动态和静态特征以增强动态表示可以缓解这一问题。因此,我们提出了一个动态-静态特征解耦模块(DSFD)。沿时间轴,它将当前帧特征中相对于参考帧特征具有显著差异的部分视为动态特征;而其余部分则被视为静态特征。随后,我们根据动态特征与当前帧特征获取分离的特征。此外,为了进一步增强从不同视角获得的解耦特征中的动态表示,并确保准确的动作预测,我们设计了一个时空相似性融合模块(TSSF)。沿空间轴,它自适应地选择动态区域的类似信息。基于上述方法,我们构建了一种新的方法DS4D。实验结果验证了我们的方法在视频到4D转换中取得了最先进的(SOTA)成果。此外,在一个真实场景数据集上的实验表明其在4D场景中的有效性。我们将公开发布代码。

URL

https://arxiv.org/abs/2502.08377

PDF

https://arxiv.org/pdf/2502.08377.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot