ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Abstract
Abstract (translated)
URL
PDF

Abstract

Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions. To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. Furthermore, while ConsistentID introduces more multimodal ID information, it maintains a fast inference speed during generation.

Abstract (translated)

基于扩散的技术取得了显著的进展，特别是在个性化面部生成方面。然而，现有的方法在实现高保真度和详细身份一致性方面面临挑战，主要原因是面部区域细粒度控制不足，以及没有全面考虑面部细节和整个面部以实现ID保留的策略。为了应对这些局限，我们引入了ConsistentID，一种为在细粒度多模态面部提示下生成多样化身份肖像的创新方法，仅使用单个参考图像。ConsistentID包括两个关键组件：一个多模态面部提示生成器，将面部特征、相应的面部描述和整个面部上下文相结合以提高面部细节的精度，和一个通过面部关注局部定位策略优化的ID保留网络，旨在保留面部区域ID的一致性。这些组件一起显著提高了ID保留的准确性，通过引入面部区域的细粒度多模态ID信息。为了方便ConsistentID的训练，我们提出了一个超过50万张面部图片的细粒度肖像数据集FGID，比现有的公共面部数据集（如LAION-Face，CelebA，FFHQ和SFHQ）具有更大的多样性和完整性。实验结果证实，我们的ConsistentID在个性化面部生成方面实现了非凡的精度和多样性，超过了MyStyle数据集中的现有方法。此外，虽然ConsistentID引入了更多的多模态ID信息，但在生成过程中保持了快速的推理速度。

URL

https://arxiv.org/abs/2404.16771

PDF

https://arxiv.org/pdf/2404.16771.pdf

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Abstract

Abstract (translated)

URL

PDF Copy

PDF