Paper Reading AI Learner

StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields

2024-03-13 07:42:21
Hongbin Xu, Weitao Chen, Feng Xiao, Baigui Sun, Wenxiong Kang

Abstract

4D style transfer aims at transferring arbitrary visual style to the synthesized novel views of a dynamic 4D scene with varying viewpoints and times. Existing efforts on 3D style transfer can effectively combine the visual features of style images and neural radiance fields (NeRF) but fail to handle the 4D dynamic scenes limited by the static scene assumption. Consequently, we aim to handle the novel challenging problem of 4D style transfer for the first time, which further requires the consistency of stylized results on dynamic objects. In this paper, we introduce StyleDyRF, a method that represents the 4D feature space by deforming a canonical feature volume and learns a linear style transformation matrix on the feature volume in a data-driven fashion. To obtain the canonical feature volume, the rays at each time step are deformed with the geometric prior of a pre-trained dynamic NeRF to render the feature map under the supervision of pre-trained visual encoders. With the content and style cues in the canonical feature volume and the style image, we can learn the style transformation matrix from their covariance matrices with lightweight neural networks. The learned style transformation matrix can reflect a direct matching of feature covariance from the content volume to the given style pattern, in analogy with the optimization of the Gram matrix in traditional 2D neural style transfer. The experimental results show that our method not only renders 4D photorealistic style transfer results in a zero-shot manner but also outperforms existing methods in terms of visual quality and consistency.

Abstract (translated)

4D风格转移的目的是将任意视觉风格从一个动态4D场景的合成新视角中转移,具有不同的视点和时间。现有关于3D风格转移的努力可以有效地将风格图的视觉特征和神经辐射场(NeRF)的视觉特征结合在一起,但无法处理受静态场景假设限制的4D动态场景。因此,我们旨在首次处理4D风格转移这一新颖挑战问题,这进一步需要动态对象上风格化的结果保持一致。在本文中,我们引入了StyleDyRF方法,该方法通过变形一个规范的特征卷积来表示4D特征空间,并以数据驱动的方式在特征卷积中学习一个线性风格变换矩阵。为了获得规范的特征卷积,每个时间步骤的光线通过预训练动态NeRF的几何先验进行变形,然后在预训练视觉编码器的监督下渲染特征图。有了规范的特征卷积和风格图,我们可以通过轻量级神经网络的学习从它们的协方差矩阵中学习风格变换矩阵。学习到的风格变换矩阵可以反映从内容卷积到给定风格模式的直接匹配,类似于传统2D神经风格转移中Gram矩阵的优化。实验结果表明,我们的方法不仅在零散射击方式下渲染4D照片真实感的风格转移结果,而且在视觉质量和一致性方面优于现有方法。

URL

https://arxiv.org/abs/2403.08310

PDF

https://arxiv.org/pdf/2403.08310.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot