Abstract
This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from spatial-conditioning. To rectify this, we propose an abstraction-aware framework, utilising a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model, working synergistically to reinforce fine-grained sketch-photo association. Our approach operates seamlessly during inference without the need for textual prompts; a simple, rough sketch akin to what you and I can create suffices! We welcome everyone to examine results presented in the paper and its supplementary. Contributions include democratising sketch control, introducing an abstraction-aware framework, and leveraging discriminative guidance, validated through extensive experiments.
Abstract (translated)
本文探讨了对于扩散模型的 sketches 的潜在功能,并解决了生成式 AI 中直接绘制控制所带来的误导性承诺。我们重要的是使过程民主化,使业余 sketches 能够生成精确的图像,达到“你画什么,你就得到什么”的承诺。一个试点研究证实了必要性,揭示了现有模型的畸形源于空间约束。为了纠正这个问题,我们提出了一个抽象感知框架,利用了插图适配器、自适应时间步采样和预训练的精细颗粒插图基于图像检索模型的歧视性指导,协同工作以强化细粒度插图与照片的关联。在推理过程中,我们的方法无需文本提示操作顺畅;类似于我们和您可以创建的简单而粗糙的插图足够了!我们欢迎所有人研究论文及其补充。贡献包括使插图控制民主化、引入了抽象感知框架以及利用了歧视性指导,并通过广泛的实验验证了其有效性。
URL
https://arxiv.org/abs/2403.07234