Abstract
Recent advances in diffusion models have significantly improved text-to-image (T2I) generation, but they often struggle to balance fine-grained precision with high-level control. Methods like ControlNet and T2I-Adapter excel at following sketches by seasoned artists but tend to be overly rigid, replicating unintentional flaws in sketches from novice users. Meanwhile, coarse-grained methods, such as sketch-based abstraction frameworks, offer more accessible input handling but lack the precise control needed for detailed, professional use. To address these limitations, we propose KnobGen, a dual-pathway framework that democratizes sketch-based image generation by seamlessly adapting to varying levels of sketch complexity and user skill. KnobGen uses a Coarse-Grained Controller (CGC) module for high-level semantics and a Fine-Grained Controller (FGC) module for detailed refinement. The relative strength of these two modules can be adjusted through our knob inference mechanism to align with the user's specific needs. These mechanisms ensure that KnobGen can flexibly generate images from both novice sketches and those drawn by seasoned artists. This maintains control over the final output while preserving the natural appearance of the image, as evidenced on the MultiGen-20M dataset and a newly collected sketch dataset.
Abstract (translated)
近年来,扩散模型的进步显著提高了文本到图像(T2I)生成,但它们通常在提高细粒度精度的同时,难以平衡高级控制。像ControlNet和T2I-Adapter这样的方法在遵循成熟艺术家的轮廓方面表现出色,但往往过于刚性,重复新手用户的粗略轮廓中的无意缺陷。与此同时,粗粒度方法,如基于轮廓的抽象框架,提供了一种更易接近的输入处理方式,但缺乏用于详细、专业使用的精确控制。为了克服这些局限性,我们提出了KnobGen,一种双路径方法,通过平滑地适应各种轮廓复杂度和用户技能水平,将基于轮廓的图像生成民主化。KnobGen使用粗粒度控制器(CGC)模块和高精度控制器(FGC)模块。这些模块的相对强度可以通过我们的开关推理机制进行调整,以满足用户的具体需求。这些机制确保KnobGen可以从新手用户的轮廓和资深艺术家的作品中灵活生成图像。这可以在MultiGen-20M数据集和一个新的手绘轮廓数据集上得到保持控制的同时保持图像的自然外观。
URL
https://arxiv.org/abs/2410.01595