Abstract
In this work, we systematically study the problem of personalized text-to-image generation, where the output image is expected to portray information about specific human subjects. E.g., generating images of oneself appearing at imaginative places, interacting with various items, or engaging in fictional activities. To this end, we focus on text-to-image systems that input a single image of an individual to ground the generation process along with text describing the desired visual context. Our first contribution is to fill the literature gap by curating high-quality, appropriate data for this task. Namely, we introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available. Having established Stellar to promote cross-systems fine-grained comparisons further, we introduce a rigorous ensemble of specialized metrics that highlight and disentangle fundamental properties such systems should obey. Besides being intuitive, our new metrics correlate significantly more strongly with human judgment than currently used metrics on this task. Last but not least, drawing inspiration from the recent works of ELITE and SDXL, we derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA. For more information, please visit our project's website: this https URL.
Abstract (translated)
在这项工作中,我们系统地研究了个性化文本到图像生成的問題,其中期望输出的图像应描绘关于特定人类主体的信息。例如,生成自己出现在富有想象力的地方的图像,与各种物品互动,或参与虚构活动的图像。为此,我们专注于将单个个体输入的图像与描述所需视觉上下文的文本相结合的文本到图像系统。我们的第一贡献是通过策展高质量、适当的數據来填补文献空白。具体来说,我们引入了一个标准化的數據集(Stellar),其中包含了个性化的提示以及与个体相关的图像,这个數據集比現有的相關數據集要大得多,而且有豐富的有用语义元數據。為了進一步推廣跨系統的微細比較,我們引入了一個嚴謹的元組指標集,強調和區分系統應該遵守的基本屬性。除了具有直觀性之外,與目前使用的指標相比,我們的新指標與人類判斷的相關性顯著更大。最後,借鑒於ELITE和SDXL近期的研究,我們推導了一個簡單而有效的个性化文本到圖像基線,這不需要對每個受眾進行測試時間微調,而且為數量和人為實驗設置了一個新的SoTA。更多資訊,歡迎訪問我們的項目網站:此處。
URL
https://arxiv.org/abs/2312.06116