Abstract
This technical report presents a diffusion model based framework for face swapping between two portrait images. The basic framework consists of three components, i.e., IP-Adapter, ControlNet, and Stable Diffusion's inpainting pipeline, for face feature encoding, multi-conditional generation, and face inpainting respectively. Besides, I introduce facial guidance optimization and CodeFormer based blending to further improve the generation quality. Specifically, we engage a recent light-weighted customization method (i.e., DreamBooth-LoRA), to guarantee the identity consistency by 1) using a rare identifier "sks" to represent the source identity, and 2) injecting the image features of source portrait into each cross-attention layer like the text features. Then I resort to the strong inpainting ability of Stable Diffusion, and utilize canny image and face detection annotation of the target portrait as the conditions, to guide ContorlNet's generation and align source portrait with the target portrait. To further correct face alignment, we add the facial guidance loss to optimize the text embedding during the sample generation.
Abstract (translated)
此技术报告提出了一种基于扩散模型的图像交换框架,用于两个肖像图像之间的换脸。基本框架包括三个组件:IP适配器、控制网络和稳定扩散的修复管道,用于分别进行面部特征编码、多条件生成和换脸修复。此外,我还引入了面部指导优化和基于CodeFormer的混合技术,进一步提高了生成质量。具体来说,我们采用了一种轻量级的自定义化方法(即DreamBooth-LoRA),通过使用稀有标识符“sks”来表示原始身份,并在每个交叉注意层中注入原始肖像的图像特征,从而保证身份一致性。然后我依赖于Stable Diffusion的强修复能力,并利用目标肖像的清晰图像和面部检测标注作为条件,引导控制网络的生成并使原始肖像与目标肖像对齐。为了进一步校正面部对齐,我们在样本生成过程中添加了面部指导损失,用于优化文本嵌入。
URL
https://arxiv.org/abs/2403.01108