Paper Reading AI Learner

TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

2024-04-18 01:10:24
Tianyi Liang, Jiangqi Liu, Sicheng Song, Shiqi Jiang, Yifei Huang, Changbo Wang, Chenhui Li

Abstract

Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text-centric design and visual harmony generation. Our method employs force-directed attention guidance in T2I models to generate images that strategically reserve whitespace for pre-defined text areas, even for text or icons at the golden ratio. Observing how cross-attention maps affect object placement, we detect and repel conflicting objects using a force-directed graph approach, combined with a Spatial Excluding Cross-Attention Constraint for smooth attention in whitespace areas. As a novel task in graphic design, experiments indicate that TextCenGen outperforms existing methods with more harmonious compositions. Furthermore, our method significantly enhances T2I model outcomes on our specially collected prompt datasets, catering to varied text positions. These results demonstrate the efficacy of TextCenGen in creating more harmonious and integrated text-image compositions.

Abstract (translated)

近年来,在文本转图像(T2I)生成方面的进步见证了从适应文本到固定背景的转变,再到围绕文本创建图像。传统方法通常局限于生成静态图像中的文本布局。我们提出的方法,TextCenGen,引入了适应空白区域的动态调整,强调了以文本为中心的设计和视觉和谐生成。我们的方法采用力的指导在T2I模型中生成具有战略性地保留预定义文本区域的图像,即使文本或图标位于黄金比例。通过观察跨注意图如何影响物体放置,我们使用力的图方法检测并排斥相互冲突的物体,并结合空间排除跨注意约束来优化在空白区域中的注意力。作为图形设计领域的新颖任务,实验表明TextCenGen比现有方法具有更和谐的创作效果。此外,我们的方法显著增强了T2I模型在我们专门收集的提示数据上的表现,适用于各种文本位置。这些结果证明了TextCenGen在创建更和谐和整合的文本图像组合方面的有效性。

URL

https://arxiv.org/abs/2404.11824

PDF

https://arxiv.org/pdf/2404.11824.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot