Abstract
Despite their exceptional generative abilities, large text-to-image diffusion models, much like skilled but careless artists, often struggle with accurately depicting visual relationships between objects. This issue, as we uncover through careful analysis, arises from a misaligned text encoder that struggles to interpret specific relationships and differentiate the logical order of associated objects. To resolve this, we introduce a novel task termed Relation Rectification, aiming to refine the model to accurately represent a given relationship it initially fails to generate. To address this, we propose an innovative solution utilizing a Heterogeneous Graph Convolutional Network (HGCN). It models the directional relationships between relation terms and corresponding objects within the input prompts. Specifically, we optimize the HGCN on a pair of prompts with identical relational words but reversed object orders, supplemented by a few reference images. The lightweight HGCN adjusts the text embeddings generated by the text encoder, ensuring the accurate reflection of the textual relation in the embedding space. Crucially, our method retains the parameters of the text encoder and diffusion model, preserving the model's robust performance on unrelated descriptions. We validated our approach on a newly curated dataset of diverse relational data, demonstrating both quantitative and qualitative enhancements in generating images with precise visual relations. Project page: this https URL.
Abstract (translated)
尽管它们具有出色的生成能力,大型文本到图像扩散模型(如熟练但粗心的艺术家)通常在准确描绘物体之间的视觉关系方面遇到困难。通过仔细分析,我们发现这一问题源于一个失衡的文本编码器,它难以解释具体的关系,并区分相关对象的逻辑顺序。为解决这个问题,我们引入了一个名为关系纠正的新任务,旨在优化模型以准确表示其最初无法生成的关系。为解决这一问题,我们提出了一种创新的方法利用异质图卷积网络(HGCN)。它通过输入提示来建模关系词汇之间和相应物体之间的方向关系。具体来说,我们在一对具有相同关系词但反向物体顺序的提示上优化HGCN,并补充了几个参考图像。轻量级的HGCN调整了由文本编码器生成的文本嵌入,确保了文本中关系的准确映射在嵌入空间中的反映。关键的是,我们的方法保留了文本编码器和解扩散模型的参数,保持模型在不相关描述上的稳健性能。我们在包含多样关系数据的新数据集中评估了我们的方法,证明了在生成精确视觉关系图片方面 both quantitative and qualitative enhancements。项目页面:this <https://this URL>.
URL
https://arxiv.org/abs/2403.20249