Paper Reading AI Learner

Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods

2023-12-11 04:47:39
Panos Achlioptas, Alexandros Benetatos, Iordanis Fostiropoulos, Dimitris Skourtis

Abstract

In this work, we systematically study the problem of personalized text-to-image generation, where the output image is expected to portray information about specific human subjects. E.g., generating images of oneself appearing at imaginative places, interacting with various items, or engaging in fictional activities. To this end, we focus on text-to-image systems that input a single image of an individual to ground the generation process along with text describing the desired visual context. Our first contribution is to fill the literature gap by curating high-quality, appropriate data for this task. Namely, we introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available. Having established Stellar to promote cross-systems fine-grained comparisons further, we introduce a rigorous ensemble of specialized metrics that highlight and disentangle fundamental properties such systems should obey. Besides being intuitive, our new metrics correlate significantly more strongly with human judgment than currently used metrics on this task. Last but not least, drawing inspiration from the recent works of ELITE and SDXL, we derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA. For more information, please visit our project's website: this https URL.

Abstract (translated)

在这项工作中,我们系统地研究了个性化文本到图像生成的問題,其中期望输出的图像应描绘关于特定人类主体的信息。例如,生成自己出现在富有想象力的地方的图像,与各种物品互动,或参与虚构活动的图像。为此,我们专注于将单个个体输入的图像与描述所需视觉上下文的文本相结合的文本到图像系统。我们的第一贡献是通过策展高质量、适当的數據来填补文献空白。具体来说,我们引入了一个标准化的數據集(Stellar),其中包含了个性化的提示以及与个体相关的图像,这个數據集比現有的相關數據集要大得多,而且有豐富的有用语义元數據。為了進一步推廣跨系統的微細比較,我們引入了一個嚴謹的元組指標集,強調和區分系統應該遵守的基本屬性。除了具有直觀性之外,與目前使用的指標相比,我們的新指標與人類判斷的相關性顯著更大。最後,借鑒於ELITE和SDXL近期的研究,我們推導了一個簡單而有效的个性化文本到圖像基線,這不需要對每個受眾進行測試時間微調,而且為數量和人為實驗設置了一個新的SoTA。更多資訊,歡迎訪問我們的項目網站:此處。

URL

https://arxiv.org/abs/2312.06116

PDF

https://arxiv.org/pdf/2312.06116.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot