Abstract
Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text-centric design and visual harmony generation. Our method employs force-directed attention guidance in T2I models to generate images that strategically reserve whitespace for pre-defined text areas, even for text or icons at the golden ratio. Observing how cross-attention maps affect object placement, we detect and repel conflicting objects using a force-directed graph approach, combined with a Spatial Excluding Cross-Attention Constraint for smooth attention in whitespace areas. As a novel task in graphic design, experiments indicate that TextCenGen outperforms existing methods with more harmonious compositions. Furthermore, our method significantly enhances T2I model outcomes on our specially collected prompt datasets, catering to varied text positions. These results demonstrate the efficacy of TextCenGen in creating more harmonious and integrated text-image compositions.
Abstract (translated)
近年来,在文本转图像(T2I)生成方面的进步见证了从适应文本到固定背景的转变,再到围绕文本创建图像。传统方法通常局限于生成静态图像中的文本布局。我们提出的方法,TextCenGen,引入了适应空白区域的动态调整,强调了以文本为中心的设计和视觉和谐生成。我们的方法采用力的指导在T2I模型中生成具有战略性地保留预定义文本区域的图像,即使文本或图标位于黄金比例。通过观察跨注意图如何影响物体放置,我们使用力的图方法检测并排斥相互冲突的物体,并结合空间排除跨注意约束来优化在空白区域中的注意力。作为图形设计领域的新颖任务,实验表明TextCenGen比现有方法具有更和谐的创作效果。此外,我们的方法显著增强了T2I模型在我们专门收集的提示数据上的表现,适用于各种文本位置。这些结果证明了TextCenGen在创建更和谐和整合的文本图像组合方面的有效性。
URL
https://arxiv.org/abs/2404.11824