Abstract
Recent advancement in personalized image generation have unveiled the intriguing capability of pre-trained text-to-image models on learning identity information from a collection of portrait images. However, existing solutions can be vulnerable in producing truthful details, and usually suffer from several defects such as (i) The generated face exhibit its own unique characteristics, \ie facial shape and facial feature positioning may not resemble key characteristics of the input, and (ii) The synthesized face may contain warped, blurred or corrupted regions. In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input. Concretely, we inject several SOTA face models into the generation procedure, achieving a more efficient label-tagging, data-processing, and model post-processing compared to previous solutions, such as DreamBooth ~\cite{ruiz2023dreambooth} , InstantBooth ~\cite{shi2023instantbooth} , or other LoRA-only approaches ~\cite{hu2021lora} . Through the development of FaceChain, we have identified several potential directions to accelerate development of Face/Human-Centric AIGC research and application. We have designed FaceChain as a framework comprised of pluggable components that can be easily adjusted to accommodate different styles and personalized needs. We hope it can grow to serve the burgeoning needs from the communities. FaceChain is open-sourced under Apache-2.0 license at \url{this https URL}.
Abstract (translated)
最近的个性化图像生成技术的进步揭示了预训练文本到图像模型从一组肖像图像中学习身份信息的独特能力。然而,现有的解决方案在生成真实细节方面可能存在脆弱性,通常会出现多个缺陷,例如(i)生成的面部呈现其自身的独特特征, \ie 面部形状和面部特征位置可能不像输入的关键特征相似,(ii)合成的面部可能包含扭曲、模糊或失真的区域。在本文中,我们介绍了 FaceChain,一个个性化的肖像生成框架,它结合了一系列定制的图像生成模型和大量的面部相关感知理解模型,以解决上述挑战并生成只有少量肖像图像输入的真实个性化肖像。具体而言,我们注入 several SOTA 面部模型到生成过程,比过去的解决方案更高效地进行标签标注、数据处理和模型后处理,相比 Dreambooth ~\cite{ruiz2023dreambooth}、Instantbooth ~\cite{shi2023Instantbooth} 或 other LoRA-only approaches ~\cite{hu2021lora} 等方案更加高效。通过开发 FaceChain,我们识别了几个可能的方向,以加速 Face/人类中心 AIGC 研究和应用程序的发展。我们设计了 FaceChain,作为一个可插拔组件组成的框架,可以轻松适应不同的风格和个性化需求。我们希望它能够成长来满足社区不断增长的需求。FaceChain 采用 Apache-2.0 许可证开源。
URL
https://arxiv.org/abs/2308.14256