ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis

Abstract
Abstract (translated)
URL
PDF

Abstract

Text-to-Image (T2I) Synthesis has made tremendous strides in enhancing synthesized image quality, but current datasets evaluate model performance only on descriptive, instruction-based prompts. Real-world news image captions take a more pragmatic approach, providing high-level situational and Named-Entity (NE) information and limited physical object descriptions, making them abstractive. To evaluate the ability of T2I models to capture intended subjects from news captions, we introduce the Abstractive News Captions with High-level cOntext Representation (ANCHOR) dataset, containing 70K+ samples sourced from 5 different news media organizations. With Large Language Models (LLM) achieving success in language and commonsense reasoning tasks, we explore the ability of different LLMs to identify and understand key subjects from abstractive captions. Our proposed method Subject-Aware Finetuning (SAFE), selects and enhances the representation of key subjects in synthesized images by leveraging LLM-generated subject weights. It also adapts to the domain distribution of news images and captions through custom Domain Fine-tuning, outperforming current T2I baselines on ANCHOR. By launching the ANCHOR dataset, we hope to motivate research in furthering the Natural Language Understanding (NLU) capabilities of T2I models.

Abstract (translated)

文本转图像（T2I）合成已经在提高合成图像质量方面取得了巨大的进步，但是现有的数据集仅评估模型在描述性、指令式提示上的性能。现实世界的新闻标题更务实，提供高级情境和命名实体（NE）信息，以及有限的物体描述，使它们具有抽象性。为了评估T2I模型从新闻标题中捕捉意图主题的能力，我们引入了抽象新闻标题高级上下文表示（ANCHOR）数据集，包含来自5个不同新闻媒体组织的70K+个样本。在大语言模型（LLM）在语言和常识推理任务中取得成功之后，我们研究了不同LLM从抽象性摘要中识别和理解关键主题的能力。我们提出的SAFE方法选择和增强了通过LLM生成的主题权重对合成图像中关键主题的表示。它还通过自定义领域微调适应新闻图像和摘要的领域分布，在ANCHOR数据集上优于当前的T2I基线。通过启动ANCHOR数据集，我们希望激励研究进一步提高T2I模型的自然语言理解（NLU）能力。

URL

https://arxiv.org/abs/2404.10141

PDF

https://arxiv.org/pdf/2404.10141.pdf

ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis

Abstract

Abstract (translated)

URL

PDF Copy

PDF