Paper Reading AI Learner

ArtNeRF: A Stylized Neural Field for 3D-Aware Cartoonized Face Synthesis

2024-04-21 16:45:35
Zichen Tang, Hongyu Yang

Abstract

Recent advances in generative visual models and neural radiance fields have greatly boosted 3D-aware image synthesis and stylization tasks. However, previous NeRF-based work is limited to single scene stylization, training a model to generate 3D-aware cartoon faces with arbitrary styles remains unsolved. We propose ArtNeRF, a novel face stylization framework derived from 3D-aware GAN to tackle this problem. In this framework, we utilize an expressive generator to synthesize stylized faces and a triple-branch discriminator module to improve the visual quality and style consistency of the generated faces. Specifically, a style encoder based on contrastive learning is leveraged to extract robust low-dimensional embeddings of style images, empowering the generator with the knowledge of various styles. To smooth the training process of cross-domain transfer learning, we propose an adaptive style blending module which helps inject style information and allows users to freely tune the level of stylization. We further introduce a neural rendering module to achieve efficient real-time rendering of images with higher resolutions. Extensive experiments demonstrate that ArtNeRF is versatile in generating high-quality 3D-aware cartoon faces with arbitrary styles.

Abstract (translated)

近年来,在生成视觉模型和神经辐射场方面取得了显著的进展,极大地推动了3D感知图像合成和风格化任务的发展。然而,先前的基于NeRF的工作仅限于单场景风格化,将模型训练为生成任意风格的三维卡通面部仍然是一个未解决的问题。我们提出了ArtNeRF,一种基于3D感知GAN的新颖面部风格化框架,以解决这个问题。在这个框架中,我们利用具有表现力的生成器合成风格化的面部,并采用三重分支的判别器模块来提高生成的面的视觉质量和风格一致性。具体来说,我们基于对比学习的方法提出了一个风格编码器,提取出风格图的低维度嵌入,使得生成器获得各种风格的知识。为了平滑跨域迁移学习的训练过程,我们提出了一个自适应风格融合模块,有助于注入风格信息,并允许用户自由调整风格水平。我们还引入了神经渲染模块,以实现高分辨率图像的实时渲染。大量的实验结果表明,ArtNeRF在生成具有任意风格的高质量3D卡通面部方面具有多样性。

URL

https://arxiv.org/abs/2404.13711

PDF

https://arxiv.org/pdf/2404.13711.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot