Abstract
We introduce a model named DreamLight for universal image relighting in this work, which can seamlessly composite subjects into a new background while maintaining aesthetic uniformity in terms of lighting and color tone. The background can be specified by natural images (image-based relighting) or generated from unlimited text prompts (text-based relighting). Existing studies primarily focus on image-based relighting, while with scant exploration into text-based scenarios. Some works employ intricate disentanglement pipeline designs relying on environment maps to provide relevant information, which grapples with the expensive data cost required for intrinsic decomposition and light source. Other methods take this task as an image translation problem and perform pixel-level transformation with autoencoder architecture. While these methods have achieved decent harmonization effects, they struggle to generate realistic and natural light interaction effects between the foreground and background. To alleviate these challenges, we reorganize the input data into a unified format and leverage the semantic prior provided by the pretrained diffusion model to facilitate the generation of natural results. Moreover, we propose a Position-Guided Light Adapter (PGLA) that condenses light information from different directions in the background into designed light query embeddings, and modulates the foreground with direction-biased masked attention. In addition, we present a post-processing module named Spectral Foreground Fixer (SFF) to adaptively reorganize different frequency components of subject and relighted background, which helps enhance the consistency of foreground appearance. Extensive comparisons and user study demonstrate that our DreamLight achieves remarkable relighting performance.
Abstract (translated)
在这项工作中,我们介绍了一种名为DreamLight的模型,用于通用图像再照明。该模型能够将主体无缝地合成到新的背景中,并保持在光照和色调方面的美学一致性。背景可以通过自然图像(基于图像的再照明)或从无限文本提示生成(基于文本的再照明)来指定。现有的研究主要集中在基于图像的再照明上,而对于基于文本的情景则缺乏探索。 一些工作采用复杂的分解管道设计,依赖环境地图提供相关信息,但这种方法面临着内在分解和光源所需的数据成本高昂的问题。其他方法将此任务视为图像转换问题,并通过自动编码器架构执行像素级变换。尽管这些方法已经实现了相当协调的效果,但在前景和背景之间的光照交互效果的真实性和自然性方面仍存在挑战。 为了缓解这些问题,我们将输入数据重新组织成统一格式,并利用预训练扩散模型提供的语义先验来促进生成自然结果。此外,我们提出了一种位置引导的光适配器(Position-Guided Light Adapter, PGLA),它从背景的不同方向中提取光照信息,并将其压缩到设计好的光查询嵌入中;通过带有方向偏置的掩码注意力机制调制前景图像。 另外,我们还介绍了一个名为Spectral Foreground Fixer (SFF) 的后处理模块,该模块能够自适应地重新组织主体和再照明背景的不同频率成分,这有助于增强前景外观的一致性。广泛的比较实验和用户研究显示,我们的DreamLight模型在再照明性能方面表现出显著优势。
URL
https://arxiv.org/abs/2506.14549