Paper Reading AI Learner

CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis

2025-05-23 07:56:25
Florian Barthel, Wieland Morgenstern, Paul Hinzer, Anna Hilsmann, Peter Eisert

Abstract

Recently, 3D GANs based on 3D Gaussian splatting have been proposed for high quality synthesis of human heads. However, existing methods stabilize training and enhance rendering quality from steep viewpoints by conditioning the random latent vector on the current camera position. This compromises 3D consistency, as we observe significant identity changes when re-synthesizing the 3D head with each camera shift. Conversely, fixing the camera to a single viewpoint yields high-quality renderings for that perspective but results in poor performance for novel views. Removing view-conditioning typically destabilizes GAN training, often causing the training to collapse. In response to these challenges, we introduce CGS-GAN, a novel 3D Gaussian Splatting GAN framework that enables stable training and high-quality 3D-consistent synthesis of human heads without relying on view-conditioning. To ensure training stability, we introduce a multi-view regularization technique that enhances generator convergence with minimal computational overhead. Additionally, we adapt the conditional loss used in existing 3D Gaussian splatting GANs and propose a generator architecture designed to not only stabilize training but also facilitate efficient rendering and straightforward scaling, enabling output resolutions up to $2048^2$. To evaluate the capabilities of CGS-GAN, we curate a new dataset derived from FFHQ. This dataset enables very high resolutions, focuses on larger portions of the human head, reduces view-dependent artifacts for improved 3D consistency, and excludes images where subjects are obscured by hands or other objects. As a result, our approach achieves very high rendering quality, supported by competitive FID scores, while ensuring consistent 3D scene generation. Check our our project page here: this https URL

Abstract (translated)

最近,基于三维高斯点阵(3D Gaussian splatting)的三维生成对抗网络(3D GANs)被提出用于高质量的人脸合成。然而,现有的方法通过将随机潜在向量条件化于当前摄像机位置来稳定训练并提升从极端视角进行渲染的质量,这种方法会损害三维一致性,因为在重新合成人脸时随着相机的位置变化会产生明显的身份变化。相比之下,固定相机在一个单一的视角虽然可以为该视角提供高质量的渲染效果,但对于新的视图则表现不佳。移除视图条件化通常会导致GAN训练不稳定,经常使训练崩溃。 为了应对这些挑战,我们引入了CGS-GAN(Conditional Gaussian Splatting GAN),这是一种新型的三维高斯点阵生成对抗网络框架,它能够在不依赖于视图条件下实现稳定的训练和高质量、三维一致的人脸合成。为确保训练稳定性,我们提出了一种多视角正则化技术,该技术在几乎无额外计算成本的情况下增强了生成器的收敛性。此外,我们将现有的3D高斯点阵GAN中使用的条件损失进行了调整,并设计了一个新的生成器架构,不仅可以稳定训练过程,而且有助于高效的渲染和轻松扩展,从而能够输出高达$2048^2$分辨率的结果。 为了评估CGS-GAN的能力,我们基于FFHQ数据集构建了新的数据集。该数据集支持非常高的解析度,重点关注人脸的较大区域,并减少视图相关的伪影以改善三维一致性。同时排除了被手或其他物体遮挡的图像样本。最终结果表明,我们的方法在高质量渲染方面取得了卓越的成绩,这得到了竞争性的FID(Fréchet Inception Distance)分数的支持,同时也保证了一致的三维场景生成。 如需了解更多信息,请访问我们的项目页面:[此链接](https://this-url.com) (请将"this https URL"替换为您实际的网址)。

URL

https://arxiv.org/abs/2505.17590

PDF

https://arxiv.org/pdf/2505.17590.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot