Abstract
We present the RAW domain diffusion model (RDDM), an end-to-end diffusion model that restores photo-realistic images directly from the sensor RAW data. While recent sRGB-domain diffusion methods achieve impressive results, they are caught in a dilemma between high fidelity and realistic generation. As these models process lossy sRGB inputs and neglect the accessibility of the sensor RAW images in many scenarios, e.g., in image and video capturing in edge devices, resulting in sub-optimal performance. RDDM bypasses this limitation by directly restoring images in the RAW domain, replacing the conventional two-stage image signal processing (ISP) + IR pipeline. However, a simple adaptation of pre-trained diffusion models to the RAW domain confronts the out-of-distribution (OOD) issues. To this end, we propose: (1) a RAW-domain VAE (RVAE) learning optimal latent representations, (2) a differentiable Post Tone Processing (PTP) module enabling joint RAW and sRGB space optimization. To compensate for the deficiency in the dataset, we develop a scalable degradation pipeline synthesizing RAW LQ-HQ pairs from existing sRGB datasets for large-scale training. Furthermore, we devise a configurable multi-bayer (CMB) LoRA module handling diverse RAW patterns such as RGGB, BGGR, etc. Extensive experiments demonstrate RDDM's superiority over state-of-the-art sRGB diffusion methods, yielding higher fidelity results with fewer artifacts.
Abstract (translated)
我们介绍了RAW域扩散模型(RDDM),这是一种端到端的扩散模型,可以直接从传感器RAW数据恢复出逼真的照片。尽管最近在sRGB域中应用的扩散方法取得了令人印象深刻的结果,但它们却面临着高保真度和现实生成之间的权衡困境。这些模型处理有损的sRGB输入,并忽视了在许多场景(例如,在边缘设备中的图像和视频捕捉)中传感器RAW图片的可访问性,从而导致次优性能。RDDM通过直接在RAW域内恢复图像来绕过这一限制,取代传统的两阶段图像信号处理(ISP)+IR流程。 然而,简单地将预训练的扩散模型适应到RAW域会遇到分布外(OOD)问题。为此,我们提出了以下方法:(1) 一个学习最优潜在表示的RAW域变分自编码器(RVAE),以及 (2) 一个可微后色调处理(PTP)模块,能够同时优化RAW和sRGB空间。 为了弥补数据集中的不足,我们开发了一个可扩展退化流水线,从现有的sRGB数据集中合成大量的RAW低质量-高质量(LQ-HQ)对用于大规模训练。此外,我们设计了配置型多拜耳(CMB)LoRA模块来处理各种RAW模式如RGGB、BGGR等。 广泛的实验表明,RDDM在图像恢复的保真度方面优于最先进的sRGB扩散方法,并且生成的结果包含更少的人为痕迹和瑕疵。
URL
https://arxiv.org/abs/2508.19154