Paper Reading AI Learner

Boosting Fidelity for Pre-Trained-Diffusion-Based Low-Light Image Enhancement via Condition Refinement

2025-10-20 02:40:06
Xiaogang Xu, Jian Wang, Yunfan Lu, Ruihang Chu, Ruixing Wang, Jiafei Wu, Bei Yu, Liang Lin

Abstract

Diffusion-based methods, leveraging pre-trained large models like Stable Diffusion via ControlNet, have achieved remarkable performance in several low-level vision tasks. However, Pre-Trained Diffusion-Based (PTDB) methods often sacrifice content fidelity to attain higher perceptual realism. This issue is exacerbated in low-light scenarios, where severely degraded information caused by the darkness limits effective control. We identify two primary causes of fidelity loss: the absence of suitable conditional latent modeling and the lack of bidirectional interaction between the conditional latent and noisy latent in the diffusion process. To address this, we propose a novel optimization strategy for conditioning in pre-trained diffusion models, enhancing fidelity while preserving realism and aesthetics. Our method introduces a mechanism to recover spatial details lost during VAE encoding, i.e., a latent refinement pipeline incorporating generative priors. Additionally, the refined latent condition interacts dynamically with the noisy latent, leading to improved restoration performance. Our approach is plug-and-play, seamlessly integrating into existing diffusion networks to provide more effective control. Extensive experiments demonstrate significant fidelity improvements in PTDB methods.

Abstract (translated)

基于扩散的方法,通过利用如ControlNet中的预训练大型模型Stable Diffusion,在若干低级视觉任务中取得了显著的性能。然而,预训练的扩散基方法(PTDB)常常为了达到更高的感知真实感而牺牲内容保真度。这一问题在低光场景中尤为严重,由于黑暗导致的信息严重退化限制了有效控制。 我们确定了两个主要的内容损失原因:缺乏合适的条件潜在模型以及条件潜在与噪声潜在之间缺乏双向交互作用。为解决这些问题,我们提出了一种新的预训练扩散模型的条件优化策略,在保持真实感和美学的同时提高保真度。我们的方法引入了一个机制来恢复VAE编码过程中丢失的空间细节,即一个包含生成先验的潜在精炼管道。此外,经过精细处理后的潜在条件与噪声潜在进行动态交互,从而提升了修复性能。 本方法采用插件式的集成方式,可以无缝地整合到现有的扩散网络中以提供更有效的控制能力。广泛的实验表明,在预训练扩散模型方法中实现了显著的内容保真度提升。

URL

https://arxiv.org/abs/2510.17105

PDF

https://arxiv.org/pdf/2510.17105.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot