Paper Reading AI Learner

From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning

2025-07-11 07:51:26
Sen Wang, Shao Zeng, Tianjun Gu, Zhizhong Zhang, Ruixin Zhang, Shouhong Ding, Jingyun Zhang, Jun Wang, Xin Tan, Yuan Xie, Lizhuang Ma

Abstract

Low-level enhancement and high-level visual understanding in low-light vision have traditionally been treated separately. Low-light enhancement improves image quality for downstream tasks, but existing methods rely on physical or geometric priors, limiting generalization. Evaluation mainly focuses on visual quality rather than downstream performance. Low-light visual understanding, constrained by scarce labeled data, primarily uses task-specific domain adaptation, which lacks scalability. To address these challenges, we build a generalized bridge between low-light enhancement and low-light understanding, which we term Generalized Enhancement For Understanding (GEFU). This paradigm improves both generalization and scalability. To address the diverse causes of low-light degradation, we leverage pretrained generative diffusion models to optimize images, achieving zero-shot generalization performance. Building on this, we propose Semantically Consistent Unsupervised Fine-tuning (SCUF). Specifically, to overcome text prompt limitations, we introduce an illumination-aware image prompt to explicitly guide image generation and propose a cycle-attention adapter to maximize its semantic potential. To mitigate semantic degradation in unsupervised training, we propose caption and reflectance consistency to learn high-level semantics and image-level spatial semantics. Extensive experiments demonstrate that our proposed method outperforms current state-of-the-art methods in traditional image quality and GEFU tasks including classification, detection, and semantic segmentation.

Abstract (translated)

在低光视觉中,底层增强和高层视觉理解传统上被分别对待。低光增强可以提升图像质量以支持下游任务的性能,但现有方法依赖于物理或几何先验知识,这限制了它们的泛化能力。评价主要集中在视觉质量而非下游任务的表现上。低光照下的视觉理解由于标注数据稀缺,通常采用特定任务领域的适应方法,这种方法缺乏可扩展性。 为解决这些挑战,我们建立了一个将低光增强和低光理解连接起来的一般桥梁,并将其命名为“用于理解的广义增强”(Generalized Enhancement For Understanding, GEFU)。这种范式能够同时提升泛化能力和可扩展性。为了应对各种低光照退化的成因,我们利用预训练的生成扩散模型来优化图像,实现零样本学习下的性能。 在此基础上,我们提出了语义一致性的无监督微调(Semantically Consistent Unsupervised Fine-tuning, SCUF)。具体来说,为克服文本提示的局限性,我们引入了光照感知型图像提示,以明确指导图像生成,并提出了一种循环注意力适配器来最大化其语义潜力。为了减轻无监督训练中的语义退化问题,我们提出了标题和反射一致性以学习高级语义及图片级空间语义。 广泛实验表明,所提出的这种方法在传统图像质量和包括分类、检测以及语义分割在内的GEFU任务上均超越了现有的最先进的方法。

URL

https://arxiv.org/abs/2507.08380

PDF

https://arxiv.org/pdf/2507.08380.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot