Abstract
In this paper, we investigate a novel artificial intelligence generation task, termed as generated contents enrichment (GCE). Different from conventional artificial intelligence contents generation task that enriches the given textual description implicitly with limited semantics for generating visually real content, our proposed GCE strives to perform content enrichment explicitly on both the visual and textual domain, from which the enriched contents are visually real, structurally reasonable, and semantically abundant. Towards to solve GCE, we propose a deep end-to-end method that explicitly explores the semantics and inter-semantic relationships during the enrichment. Specifically, we first model the input description as a semantic graph, wherein each node represents an object and each edge corresponds to the inter-object relationship. We then adopt Graph Convolutional Networks on top of the input scene description to predict the enriching objects and their relationships with the input objects. Finally, the enriched graph is fed into an image synthesis model to carry out the visual contents generation. Our experiments conducted on the Visual Genome dataset exhibit promising and visually plausible results.
Abstract (translated)
在本文中,我们研究了一种名为生成内容丰富(GCE)的新人工智能生成任务。与传统的人工智能内容生成任务不同,该任务通过在给定的文本描述中隐含有限语义来丰富文本内容,以生成视觉上真实的内容,但这种丰富内容在语义上不充分。为了解决GCE,我们提出了一个端到端的方法,在丰富过程中明确探索语义和跨语义关系。具体来说,我们首先将输入描述建模为语义图,其中每个节点表示一个对象,每条边表示对象之间的相互作用。然后我们在输入场景描述上应用图卷积网络来预测要生成的丰富对象及其与输入对象的关系。最后,生成的丰富图被输入到图像合成模型中进行视觉内容生成。我们在视觉基因组数据集上进行实验,结果表明具有鼓舞人心的视觉和视觉合理的结果。
URL
https://arxiv.org/abs/2405.03650