Paper Reading AI Learner

Salient Object-Aware Background Generation using Text-Guided Diffusion Models

2024-04-15 22:13:35
Amir Erfan Eshratifar, Joao V. B. Soares, Kapil Thadani, Shaunak Mishra, Mikhail Kuznetsov, Yueh-Ning Ku, Paloma de Juan

Abstract

Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object's boundaries on a blank background. Although popular diffusion models for text-guided inpainting can also be used for outpainting by mask inversion, they are trained to fill in missing parts of an image rather than to place an object into a scene. Consequently, when used for background creation, inpainting models frequently extend the salient object's boundaries and thereby change the object's identity, which is a phenomenon we call "object expansion." This paper introduces a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures. We present a series of qualitative and quantitative results across models and datasets, including a newly proposed metric to measure object expansion that does not require any human labeling. Compared to Stable Diffusion 2.0 Inpainting, our proposed approach reduces object expansion by 3.6x on average with no degradation in standard visual metrics across multiple datasets.

Abstract (translated)

在包括创意设计和电子商务在内的各个领域中,生成突出物场景对突出物在场景中的表现和上下文至关重要。通过将对象整合到定制环境中,可以增强主题的表现和上下文。生成背景的过程可以看作是一个文本条件下的修复绘画任务,其目标是将图像内容扩展到突出物的边界之外。尽管引导文本修复绘图模型(例如)也可以通过遮罩反向填充进行修复,但它们通过填充图像的缺失部分来修复图像,而不是将物体放入场景中。因此,当用于背景生成时,修复绘图模型经常扩展突出物的边界,从而改变物体的身份,这种现象我们称之为“物体膨胀”。本文介绍了一个使用Stable Diffusion和ControlNet架构将修复扩散模型适应突出物修复任务的模型。我们在模型和数据集上展示了的一系列定性和定量结果,包括一个不需要任何人类标注的新指标来衡量物体膨胀。与Stable Diffusion 2.0修复绘图相比,我们提出的方法在多个数据集上的标准视觉指标上减少了3.6倍的物体膨胀。

URL

https://arxiv.org/abs/2404.10157

PDF

https://arxiv.org/pdf/2404.10157.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot