Paper Reading AI Learner

DreamLight: Towards Harmonious and Consistent Image Relighting

2025-06-17 14:05:24
Yong Liu, Wenpeng Xiao, Qianqian Wang, Junlin Chen, Shiyin Wang, Yitong Wang, Xinglong Wu, Yansong Tang

Abstract

We introduce a model named DreamLight for universal image relighting in this work, which can seamlessly composite subjects into a new background while maintaining aesthetic uniformity in terms of lighting and color tone. The background can be specified by natural images (image-based relighting) or generated from unlimited text prompts (text-based relighting). Existing studies primarily focus on image-based relighting, while with scant exploration into text-based scenarios. Some works employ intricate disentanglement pipeline designs relying on environment maps to provide relevant information, which grapples with the expensive data cost required for intrinsic decomposition and light source. Other methods take this task as an image translation problem and perform pixel-level transformation with autoencoder architecture. While these methods have achieved decent harmonization effects, they struggle to generate realistic and natural light interaction effects between the foreground and background. To alleviate these challenges, we reorganize the input data into a unified format and leverage the semantic prior provided by the pretrained diffusion model to facilitate the generation of natural results. Moreover, we propose a Position-Guided Light Adapter (PGLA) that condenses light information from different directions in the background into designed light query embeddings, and modulates the foreground with direction-biased masked attention. In addition, we present a post-processing module named Spectral Foreground Fixer (SFF) to adaptively reorganize different frequency components of subject and relighted background, which helps enhance the consistency of foreground appearance. Extensive comparisons and user study demonstrate that our DreamLight achieves remarkable relighting performance.

Abstract (translated)

在这项工作中,我们介绍了一种名为DreamLight的模型,用于通用图像再照明。该模型能够将主体无缝地合成到新的背景中,并保持在光照和色调方面的美学一致性。背景可以通过自然图像(基于图像的再照明)或从无限文本提示生成(基于文本的再照明)来指定。现有的研究主要集中在基于图像的再照明上,而对于基于文本的情景则缺乏探索。 一些工作采用复杂的分解管道设计,依赖环境地图提供相关信息,但这种方法面临着内在分解和光源所需的数据成本高昂的问题。其他方法将此任务视为图像转换问题,并通过自动编码器架构执行像素级变换。尽管这些方法已经实现了相当协调的效果,但在前景和背景之间的光照交互效果的真实性和自然性方面仍存在挑战。 为了缓解这些问题,我们将输入数据重新组织成统一格式,并利用预训练扩散模型提供的语义先验来促进生成自然结果。此外,我们提出了一种位置引导的光适配器(Position-Guided Light Adapter, PGLA),它从背景的不同方向中提取光照信息,并将其压缩到设计好的光查询嵌入中;通过带有方向偏置的掩码注意力机制调制前景图像。 另外,我们还介绍了一个名为Spectral Foreground Fixer (SFF) 的后处理模块,该模块能够自适应地重新组织主体和再照明背景的不同频率成分,这有助于增强前景外观的一致性。广泛的比较实验和用户研究显示,我们的DreamLight模型在再照明性能方面表现出显著优势。

URL

https://arxiv.org/abs/2506.14549

PDF

https://arxiv.org/pdf/2506.14549.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot