Abstract
Current face editing methods mainly rely on GAN-based techniques, but recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in manipulating fine-grained attributes and preserving consistency of attributes that should remain unchanged. To address these issues and facilitate more convenient editing of face images, we propose a novel approach that leverages the power of Stable-Diffusion models and crude 3D face models to control the lighting, facial expression and head pose of a portrait photo. We observe that this task essentially involve combinations of target background, identity and different face attributes. We aim to sufficiently disentangle the control of these factors to enable high-quality of face editing. Specifically, our method, coined as RigFace, contains: 1) A Spatial Arrtibute Encoder that provides presise and decoupled conditions of background, pose, expression and lighting; 2) An Identity Encoder that transfers identity features to the denoising UNet of a pre-trained Stable-Diffusion model; 3) An Attribute Rigger that injects those conditions into the denoising UNet. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models.
Abstract (translated)
当前的人脸编辑方法主要依赖于基于GAN的技术,但最近的研究焦点转向了扩散模型(diffusion-based models),因为这些模型在图像重建方面表现出色。然而,扩散模型仍然面临着挑战,即难以精确操纵细粒度属性并保持不应改变的属性的一致性。为了应对这些问题,并促进人脸图像编辑的便利性,我们提出了一种新的方法,该方法利用稳定扩散(Stable-Diffusion)模型和粗糙3D面部模型的力量来控制肖像照片中的光照、面部表情及头部姿态。我们观察到,这项任务本质上涉及目标背景、身份以及不同面部属性的组合。我们的目标是充分解耦这些因素的控制,以实现高质量的人脸编辑。具体来说,我们的方法,命名为RigFace,包含: 1. **空间属性编码器**(Spatial Attribute Encoder):提供精确且独立于其他因素的背景、姿势、表情和光照条件。 2. **身份编码器**(Identity Encoder):将身份特征转移到预训练稳定扩散模型中的去噪UNet中。 3. **属性控制器**(Attribute Rigger):将这些条件注入到去噪UNet中。 我们的模型在与现有面部编辑模型相比时,在身份保持和照片逼真度方面达到了相当甚至更好的性能。
URL
https://arxiv.org/abs/2502.02465