Abstract
Inspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing detailed 3D scenes within multi-view setups and the emergence of large 2D human foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing a human face foundation model as guidance with just a single image as input. To achieve that, we extend such a model for diverse-view human head generation by fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain a dense correspondence with a human face mesh template, allowing blendshape-based expression generation. This is achieved through a modified 3DGS approach, connectivity regularizers, and a strategic initialization tailored for our task. Additionally, we propose an optional efficient SDS-based correction step to refine the blendshape expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar achieves state-of-the-art realism and identity preservation, effectively addressing color issues by allowing the use of very low guidance, enabled by our strong identity prior and initialization strategy, without compromising detail.
Abstract (translated)
受3D高斯点阵(3DGS)在多视角设置中重建详细3D场景的高效性和大型2D人体基础模型出现的启发,我们推出了Arc2Avatar,这是第一个基于形状分段合成(SDS)的方法,它利用单个人脸图像作为输入的人体面部基础模型进行引导。为了实现这一目标,我们在人工合成数据上对这样的模型进行了微调,并对其条件进行了修改,以适应多视角人体头部生成的需求。我们的虚拟角色与一个人类脸部网格模板保持密集对应关系,从而可以基于混合形状(blendshape)生成表情。这通过改进的3DGS方法、连通性正则化器以及专为任务定制的战略初始化来实现。 此外,我们提出了一种可选的高效SDS基线修正步骤,以细化混合形状表达,增强真实感和多样性。实验表明,Arc2Avatar在现实性和身份保持方面达到了最先进的水平,并且通过允许使用非常低的指导(由我们的强大身份先验和初始化策略启用),有效地解决了色彩问题,同时不牺牲细节。
URL
https://arxiv.org/abs/2501.05379