Abstract
Face sketch synthesis is a technique aimed at converting face photos into sketches. Existing face sketch synthesis research mainly relies on training with numerous photo-sketch sample pairs from existing datasets. However, these large-scale discriminative learning methods will have to face problems such as data scarcity and high human labor costs. Once the training data becomes scarce, their generative performance significantly degrades. In this paper, we propose a one-shot face sketch synthesis method based on diffusion models. We optimize text instructions on a diffusion model using face photo-sketch image pairs. Then, the instructions derived through gradient-based optimization are used for inference. To simulate real-world scenarios more accurately and evaluate method effectiveness more comprehensively, we introduce a new benchmark named One-shot Face Sketch Dataset (OS-Sketch). The benchmark consists of 400 pairs of face photo-sketch images, including sketches with different styles and photos with different backgrounds, ages, sexes, expressions, illumination, etc. For a solid out-of-distribution evaluation, we select only one pair of images for training at each time, with the rest used for inference. Extensive experiments demonstrate that the proposed method can convert various photos into realistic and highly consistent sketches in a one-shot context. Compared to other methods, our approach offers greater convenience and broader applicability. The dataset will be available at: this https URL
Abstract (translated)
面部草图合成是一种将人脸照片转换为素描的技术。现有的面部草图合成研究主要依赖于现有数据集中众多的照片-素描样本对进行训练。然而,这些大规模判别式学习方法面临诸如数据稀缺和高昂的人力成本等问题。一旦训练数据变得稀缺,它们的生成性能就会显著下降。在本文中,我们提出了一种基于扩散模型的一次性面部草图合成方法。我们在一个扩散模型上使用人脸照片-素描图像对来优化文本指令,并通过梯度优化得到的指令用于推理过程。为了更准确地模拟真实场景并全面评估方法的有效性,我们引入了一个新的基准测试集——一次性面部草图数据集(OS-Sketch)。该基准由400对人脸照片-素描图像组成,其中包括风格各异的素描和背景、年龄、性别、表情、光照等不同的照片。为了进行严格的离群评估,我们在每次训练时只选择一对图像进行训练,其余用于推理。大量的实验表明,所提出的方法能够在一次性环境中将各种各样的照片转换为逼真且高度一致的草图。与其它方法相比,我们的方法提供了更大的便利性和更广泛的应用性。数据集将在以下网址提供:this https URL
URL
https://arxiv.org/abs/2506.15312