Abstract
Recent advancements in diffusion-based technologies have made significant strides, particularly in identity-preserved portrait generation (IPG). However, when using multiple reference images from the same ID, existing methods typically produce lower-fidelity portraits and struggle to customize face attributes precisely. To address these issues, this paper presents HiFi-Portrait, a high-fidelity method for zero-shot portrait generation. Specifically, we first introduce the face refiner and landmark generator to obtain fine-grained multi-face features and 3D-aware face landmarks. The landmarks include the reference ID and the target attributes. Then, we design HiFi-Net to fuse multi-face features and align them with landmarks, which improves ID fidelity and face control. In addition, we devise an automated pipeline to construct an ID-based dataset for training HiFi-Portrait. Extensive experimental results demonstrate that our method surpasses the SOTA approaches in face similarity and controllability. Furthermore, our method is also compatible with previous SDXL-based works.
Abstract (translated)
最近在基于扩散的技术方面取得了一些显著进展,特别是在保留身份的肖像生成(IPG)领域。然而,当使用来自同一ID的多张参考图像时,现有方法通常会产生较低质量的肖像,并且难以精确定制面部属性。为了解决这些问题,本文提出了一种名为HiFi-Portrait的方法,这是一种用于零样本肖像生成的高保真技术。 具体来说,我们首先引入了面部精炼器和地标生成器来获取细粒度多张人脸特征及具有3D感知的人脸地标信息。这些地标包含参考ID和目标属性的信息。然后,我们设计了HiFi-Net网络用于融合多个人脸特征,并将它们与地标对齐,这提升了身份保真度并增强了面部控制能力。 此外,我们还开发了一种自动化管道来构建基于ID的数据集,以便训练HiFi-Portrait模型。广泛的实验结果表明,我们的方法在人脸相似性和可控性方面超越了现有的最先进(SOTA)方法。而且,我们的方法还可以与之前的SDXL相关工作兼容使用。
URL
https://arxiv.org/abs/2512.14542