Abstract
Recently, 3D GANs based on 3D Gaussian splatting have been proposed for high quality synthesis of human heads. However, existing methods stabilize training and enhance rendering quality from steep viewpoints by conditioning the random latent vector on the current camera position. This compromises 3D consistency, as we observe significant identity changes when re-synthesizing the 3D head with each camera shift. Conversely, fixing the camera to a single viewpoint yields high-quality renderings for that perspective but results in poor performance for novel views. Removing view-conditioning typically destabilizes GAN training, often causing the training to collapse. In response to these challenges, we introduce CGS-GAN, a novel 3D Gaussian Splatting GAN framework that enables stable training and high-quality 3D-consistent synthesis of human heads without relying on view-conditioning. To ensure training stability, we introduce a multi-view regularization technique that enhances generator convergence with minimal computational overhead. Additionally, we adapt the conditional loss used in existing 3D Gaussian splatting GANs and propose a generator architecture designed to not only stabilize training but also facilitate efficient rendering and straightforward scaling, enabling output resolutions up to $2048^2$. To evaluate the capabilities of CGS-GAN, we curate a new dataset derived from FFHQ. This dataset enables very high resolutions, focuses on larger portions of the human head, reduces view-dependent artifacts for improved 3D consistency, and excludes images where subjects are obscured by hands or other objects. As a result, our approach achieves very high rendering quality, supported by competitive FID scores, while ensuring consistent 3D scene generation. Check our our project page here: this https URL
Abstract (translated)
最近,基于三维高斯点阵(3D Gaussian splatting)的三维生成对抗网络(3D GANs)被提出用于高质量的人脸合成。然而,现有的方法通过将随机潜在向量条件化于当前摄像机位置来稳定训练并提升从极端视角进行渲染的质量,这种方法会损害三维一致性,因为在重新合成人脸时随着相机的位置变化会产生明显的身份变化。相比之下,固定相机在一个单一的视角虽然可以为该视角提供高质量的渲染效果,但对于新的视图则表现不佳。移除视图条件化通常会导致GAN训练不稳定,经常使训练崩溃。 为了应对这些挑战,我们引入了CGS-GAN(Conditional Gaussian Splatting GAN),这是一种新型的三维高斯点阵生成对抗网络框架,它能够在不依赖于视图条件下实现稳定的训练和高质量、三维一致的人脸合成。为确保训练稳定性,我们提出了一种多视角正则化技术,该技术在几乎无额外计算成本的情况下增强了生成器的收敛性。此外,我们将现有的3D高斯点阵GAN中使用的条件损失进行了调整,并设计了一个新的生成器架构,不仅可以稳定训练过程,而且有助于高效的渲染和轻松扩展,从而能够输出高达$2048^2$分辨率的结果。 为了评估CGS-GAN的能力,我们基于FFHQ数据集构建了新的数据集。该数据集支持非常高的解析度,重点关注人脸的较大区域,并减少视图相关的伪影以改善三维一致性。同时排除了被手或其他物体遮挡的图像样本。最终结果表明,我们的方法在高质量渲染方面取得了卓越的成绩,这得到了竞争性的FID(Fréchet Inception Distance)分数的支持,同时也保证了一致的三维场景生成。 如需了解更多信息,请访问我们的项目页面:[此链接](https://this-url.com) (请将"this https URL"替换为您实际的网址)。
URL
https://arxiv.org/abs/2505.17590