Abstract
Low-level enhancement and high-level visual understanding in low-light vision have traditionally been treated separately. Low-light enhancement improves image quality for downstream tasks, but existing methods rely on physical or geometric priors, limiting generalization. Evaluation mainly focuses on visual quality rather than downstream performance. Low-light visual understanding, constrained by scarce labeled data, primarily uses task-specific domain adaptation, which lacks scalability. To address these challenges, we build a generalized bridge between low-light enhancement and low-light understanding, which we term Generalized Enhancement For Understanding (GEFU). This paradigm improves both generalization and scalability. To address the diverse causes of low-light degradation, we leverage pretrained generative diffusion models to optimize images, achieving zero-shot generalization performance. Building on this, we propose Semantically Consistent Unsupervised Fine-tuning (SCUF). Specifically, to overcome text prompt limitations, we introduce an illumination-aware image prompt to explicitly guide image generation and propose a cycle-attention adapter to maximize its semantic potential. To mitigate semantic degradation in unsupervised training, we propose caption and reflectance consistency to learn high-level semantics and image-level spatial semantics. Extensive experiments demonstrate that our proposed method outperforms current state-of-the-art methods in traditional image quality and GEFU tasks including classification, detection, and semantic segmentation.
Abstract (translated)
在低光视觉中,底层增强和高层视觉理解传统上被分别对待。低光增强可以提升图像质量以支持下游任务的性能,但现有方法依赖于物理或几何先验知识,这限制了它们的泛化能力。评价主要集中在视觉质量而非下游任务的表现上。低光照下的视觉理解由于标注数据稀缺,通常采用特定任务领域的适应方法,这种方法缺乏可扩展性。 为解决这些挑战,我们建立了一个将低光增强和低光理解连接起来的一般桥梁,并将其命名为“用于理解的广义增强”(Generalized Enhancement For Understanding, GEFU)。这种范式能够同时提升泛化能力和可扩展性。为了应对各种低光照退化的成因,我们利用预训练的生成扩散模型来优化图像,实现零样本学习下的性能。 在此基础上,我们提出了语义一致性的无监督微调(Semantically Consistent Unsupervised Fine-tuning, SCUF)。具体来说,为克服文本提示的局限性,我们引入了光照感知型图像提示,以明确指导图像生成,并提出了一种循环注意力适配器来最大化其语义潜力。为了减轻无监督训练中的语义退化问题,我们提出了标题和反射一致性以学习高级语义及图片级空间语义。 广泛实验表明,所提出的这种方法在传统图像质量和包括分类、检测以及语义分割在内的GEFU任务上均超越了现有的最先进的方法。
URL
https://arxiv.org/abs/2507.08380