Paper Reading AI Learner

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

2024-03-29 00:40:12
Haipeng Liu, Yang Wang, Biao Qian, Meng Wang, Yong Rui

Abstract

Denoising diffusion probabilistic models for image inpainting aim to add the noise to the texture of image during the forward process and recover masked regions with unmasked ones of the texture via the reverse denoising process.Despite the meaningful semantics generation,the existing arts suffer from the semantic discrepancy between masked and unmasked regions, since the semantically dense unmasked texture fails to be completely degraded while the masked regions turn to the pure noise in diffusion process,leading to the large discrepancy between this http URL this paper,we aim to answer how unmasked semantics guide texture denoising process;together with how to tackle the semantic discrepancy,to facilitate the consistent and meaningful semantics this http URL this end,we propose a novel structure-guided diffusion model named StrDiffusion,to reformulate the conventional texture denoising process under structure guidance to derive a simplified denoising objective for image inpainting,while revealing:1) the semantically sparse structure is beneficial to tackle semantic discrepancy in early stage, while dense texture generates reasonable semantics in late stage;2) the semantics from unmasked regions essentially offer the time-dependent structure guidance for the texture denoising process,benefiting from the time-dependent sparsity of the structure semantics.For the denoising process,a structure-guided neural network is trained to estimate the simplified denoising objective by exploiting the consistency of the denoised structure between masked and unmasked regions.Besides,we devise an adaptive resampling strategy as a formal criterion as whether structure is competent to guide the texture denoising process,while regulate their semantic correlations.Extensive experiments validate the merits of StrDiffusion over the state-of-the-arts.Our code is available at this https URL.

Abstract (translated)

去噪图像修复的目标是在正向过程中向图像添加噪声,并通过反向去噪过程恢复被屏蔽的区域与未屏蔽区域的纹理。尽管在语义生成方面有很多有意义的创新,但现有的艺术作品在屏蔽区和未屏蔽区之间的语义差异方面存在问题,因为语义密集的未屏蔽纹理在扩散过程中未能完全退化,而屏蔽区变成了完全的噪声,导致这种本文中的URL之间的巨大差异。为了回答如何利用未屏蔽语义指导纹理去噪过程以及如何解决语义差异的问题,本文提出了一种名为StrDiffusion的新型结构指导扩散模型,将传统的纹理去噪过程在结构指导下重新建模,以生成简单的去噪目标,同时揭示:1)在早期阶段,语义稀疏的结构对解决语义差异是有益的,而密集的纹理在晚期阶段产生合理的语义;2)未屏蔽区域的语义提供了纹理去噪过程的时间依赖结构指导,利用了结构语义的时间依赖性。对于去噪过程,我们训练一个结构指导的神经网络来估计简化去噪目标,同时通过调整它们的语义关联来调节它们的去噪效果。 extensive实验证实了StrDiffusion在现有技术水平之上的优越性。我们的代码可在此处访问:https://www.xxx

URL

https://arxiv.org/abs/2403.19898

PDF

https://arxiv.org/pdf/2403.19898.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot