Paper Reading AI Learner

RDDM: Practicing RAW Domain Diffusion Model for Real-world Image Restoration

2025-08-26 16:06:17
Yan Chen, Yi Wen, Wei Li, Junchao Liu, Yong Guo, Jie Hu, Xinghao Chen

Abstract

We present the RAW domain diffusion model (RDDM), an end-to-end diffusion model that restores photo-realistic images directly from the sensor RAW data. While recent sRGB-domain diffusion methods achieve impressive results, they are caught in a dilemma between high fidelity and realistic generation. As these models process lossy sRGB inputs and neglect the accessibility of the sensor RAW images in many scenarios, e.g., in image and video capturing in edge devices, resulting in sub-optimal performance. RDDM bypasses this limitation by directly restoring images in the RAW domain, replacing the conventional two-stage image signal processing (ISP) + IR pipeline. However, a simple adaptation of pre-trained diffusion models to the RAW domain confronts the out-of-distribution (OOD) issues. To this end, we propose: (1) a RAW-domain VAE (RVAE) learning optimal latent representations, (2) a differentiable Post Tone Processing (PTP) module enabling joint RAW and sRGB space optimization. To compensate for the deficiency in the dataset, we develop a scalable degradation pipeline synthesizing RAW LQ-HQ pairs from existing sRGB datasets for large-scale training. Furthermore, we devise a configurable multi-bayer (CMB) LoRA module handling diverse RAW patterns such as RGGB, BGGR, etc. Extensive experiments demonstrate RDDM's superiority over state-of-the-art sRGB diffusion methods, yielding higher fidelity results with fewer artifacts.

Abstract (translated)

我们介绍了RAW域扩散模型(RDDM),这是一种端到端的扩散模型,可以直接从传感器RAW数据恢复出逼真的照片。尽管最近在sRGB域中应用的扩散方法取得了令人印象深刻的结果,但它们却面临着高保真度和现实生成之间的权衡困境。这些模型处理有损的sRGB输入,并忽视了在许多场景(例如,在边缘设备中的图像和视频捕捉)中传感器RAW图片的可访问性,从而导致次优性能。RDDM通过直接在RAW域内恢复图像来绕过这一限制,取代传统的两阶段图像信号处理(ISP)+IR流程。 然而,简单地将预训练的扩散模型适应到RAW域会遇到分布外(OOD)问题。为此,我们提出了以下方法:(1) 一个学习最优潜在表示的RAW域变分自编码器(RVAE),以及 (2) 一个可微后色调处理(PTP)模块,能够同时优化RAW和sRGB空间。 为了弥补数据集中的不足,我们开发了一个可扩展退化流水线,从现有的sRGB数据集中合成大量的RAW低质量-高质量(LQ-HQ)对用于大规模训练。此外,我们设计了配置型多拜耳(CMB)LoRA模块来处理各种RAW模式如RGGB、BGGR等。 广泛的实验表明,RDDM在图像恢复的保真度方面优于最先进的sRGB扩散方法,并且生成的结果包含更少的人为痕迹和瑕疵。

URL

https://arxiv.org/abs/2508.19154

PDF

https://arxiv.org/pdf/2508.19154.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot