Paper Reading AI Learner

SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation

2024-03-15 10:18:56
Peng Zheng, Tao Liu, Zili Yi, Rui Ma

Abstract

With the development of neural radiance fields and generative models, numerous methods have been proposed for learning 3D human generation from 2D images. These methods allow control over the pose of the generated 3D human and enable rendering from different viewpoints. However, none of these methods explore semantic disentanglement in human image synthesis, i.e., they can not disentangle the generation of different semantic parts, such as the body, tops, and bottoms. Furthermore, existing methods are limited to synthesize images at $512^2$ resolution due to the high computational cost of neural radiance fields. To address these limitations, we introduce SemanticHuman-HD, the first method to achieve semantic disentangled human image synthesis. Notably, SemanticHuman-HD is also the first method to achieve 3D-aware image synthesis at $1024^2$ resolution, benefiting from our proposed 3D-aware super-resolution module. By leveraging the depth maps and semantic masks as guidance for the 3D-aware super-resolution, we significantly reduce the number of sampling points during volume rendering, thereby reducing the computational cost. Our comparative experiments demonstrate the superiority of our method. The effectiveness of each proposed component is also verified through ablation studies. Moreover, our method opens up exciting possibilities for various applications, including 3D garment generation, semantic-aware image synthesis, controllable image synthesis, and out-of-domain image synthesis.

Abstract (translated)

随着神经元辐射场和生成模型的不断发展,已经提出了许多从2D图像中学习3D人类生成的方法。这些方法允许控制生成3D人类的角度,并能够从不同的角度进行渲染。然而,这些方法都没有探索人图像合成中的语义解离,即它们无法分离生成不同语义部分,如身体、顶部 和底部。此外,由于神经元辐射场的高计算成本,现有方法仅能在$512^2$的分辨率上合成图像。为了克服这些限制,我们引入了 SemanticHuman-HD,这是第一个实现语义解离的人图像合成方法。值得注意的是,SemanticHuman-HD 也是第一个在$1024^2$的分辨率上实现3D意识图像生成的方法,得益于我们提出的3D意识超分辨率模块。通过利用深度图和语义掩码作为3D意识超分辨率的有指导,我们在体积渲染过程中显著减少了抽样点数,从而降低了计算成本。我们的比较实验证实了我们的方法具有优越性。每个所提出的组件的有效性也通过消融实验得到了验证。此外,我们的方法为各种应用开辟了令人兴奋的领域,包括3D衣物的生成、语义意识图像生成、可控制图像生成和跨域图像生成。

URL

https://arxiv.org/abs/2403.10166

PDF

https://arxiv.org/pdf/2403.10166.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot