Abstract
The scarcity of free-hand sketch presents a challenging problem. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this paper, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we introduce a method for vector sketch captioning and sketch semantic expansion. Additionally, we design a sketch generation network that incorporates a fusion of multi-modal perceptual constraints, suitable for application in zero-shot image-to-sketch downstream task, demonstrating state-of-the-art performance through experimental validation. Finally, leveraging our proposed sketch-to-sketch generation method, we contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets. Our research confirms that this dataset can significantly enhance the capabilities of existing models in sketch-based image retrieval and sketch-controlled image synthesis tasks. We will make our dataset and code publicly available.
Abstract (translated)
自由手绘图的 scarcity 呈现了一个具有挑战性的问题。尽管出现了一些大规模的手绘图数据集,但这些数据集主要是由单个物体级别的手绘图组成的。仍然缺乏大规模的成对场景手绘图数据。在本文中,我们提出了一种自监督的场景手绘图生成方法,不依赖于任何现有的场景手绘图,可以将单个物体手绘图转换为场景手绘图。为了实现这一目标,我们引入了向量手绘图注释和手绘图语义扩展的方法。此外,我们还设计了一个包含多种模态感知约束的草图生成网络,适用于应用于零散图像到手绘图的下游任务,通过实验验证证明了最先进的表现。最后,利用我们提出的从手绘图到手绘图的生成方法,我们贡献了一个围绕场景手绘图的大规模数据集,包括高度语义一致的“文本手绘图像”三元组。我们的研究证实,这个数据集可以显著增强现有模型在基于手绘图的图像检索和手绘图控制的图像合成任务中的能力。我们将把数据集和代码公开发布。
URL
https://arxiv.org/abs/2405.18801