Abstract
Diffusion-based methods, leveraging pre-trained large models like Stable Diffusion via ControlNet, have achieved remarkable performance in several low-level vision tasks. However, Pre-Trained Diffusion-Based (PTDB) methods often sacrifice content fidelity to attain higher perceptual realism. This issue is exacerbated in low-light scenarios, where severely degraded information caused by the darkness limits effective control. We identify two primary causes of fidelity loss: the absence of suitable conditional latent modeling and the lack of bidirectional interaction between the conditional latent and noisy latent in the diffusion process. To address this, we propose a novel optimization strategy for conditioning in pre-trained diffusion models, enhancing fidelity while preserving realism and aesthetics. Our method introduces a mechanism to recover spatial details lost during VAE encoding, i.e., a latent refinement pipeline incorporating generative priors. Additionally, the refined latent condition interacts dynamically with the noisy latent, leading to improved restoration performance. Our approach is plug-and-play, seamlessly integrating into existing diffusion networks to provide more effective control. Extensive experiments demonstrate significant fidelity improvements in PTDB methods.
Abstract (translated)
基于扩散的方法,通过利用如ControlNet中的预训练大型模型Stable Diffusion,在若干低级视觉任务中取得了显著的性能。然而,预训练的扩散基方法(PTDB)常常为了达到更高的感知真实感而牺牲内容保真度。这一问题在低光场景中尤为严重,由于黑暗导致的信息严重退化限制了有效控制。 我们确定了两个主要的内容损失原因:缺乏合适的条件潜在模型以及条件潜在与噪声潜在之间缺乏双向交互作用。为解决这些问题,我们提出了一种新的预训练扩散模型的条件优化策略,在保持真实感和美学的同时提高保真度。我们的方法引入了一个机制来恢复VAE编码过程中丢失的空间细节,即一个包含生成先验的潜在精炼管道。此外,经过精细处理后的潜在条件与噪声潜在进行动态交互,从而提升了修复性能。 本方法采用插件式的集成方式,可以无缝地整合到现有的扩散网络中以提供更有效的控制能力。广泛的实验表明,在预训练扩散模型方法中实现了显著的内容保真度提升。
URL
https://arxiv.org/abs/2510.17105