Paper Reading AI Learner

Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design

2025-04-18 20:57:16
Wei Dong, Yan Min, Han Zhou, Jun Chen

Abstract

Current Low-light Image Enhancement (LLIE) techniques predominantly rely on either direct Low-Light (LL) to Normal-Light (NL) mappings or guidance from semantic features or illumination maps. Nonetheless, the intrinsic ill-posedness of LLIE and the difficulty in retrieving robust semantics from heavily corrupted images hinder their effectiveness in extremely low-light environments. To tackle this challenge, we present SG-LLIE, a new multi-scale CNN-Transformer hybrid framework guided by structure priors. Different from employing pre-trained models for the extraction of semantics or illumination maps, we choose to extract robust structure priors based on illumination-invariant edge detectors. Moreover, we develop a CNN-Transformer Hybrid Structure-Guided Feature Extractor (HSGFE) module at each scale with in the UNet encoder-decoder architecture. Besides the CNN blocks which excels in multi-scale feature extraction and fusion, we introduce a Structure-Guided Transformer Block (SGTB) in each HSGFE that incorporates structural priors to modulate the enhancement process. Extensive experiments show that our method achieves state-of-the-art performance on several LLIE benchmarks in both quantitative metrics and visual quality. Our solution ranks second in the NTIRE 2025 Low-Light Enhancement Challenge. Code is released at this https URL.

Abstract (translated)

目前的低光图像增强(LLIE)技术主要依赖于直接从低光照(LL)到正常光照(NL)的映射,或者通过语义特征或照明图进行指导。然而,LLIE的基本问题和从中提取稳健语义的困难使得这些方法在极端低光环境中效果不佳。 为了解决这一挑战,我们提出了SG-LLIE,这是一种新的多尺度CNN-Transformer混合框架,由结构先验引导。与使用预训练模型来提取语义或光照图不同,我们选择基于不变于光照的边缘检测器来提取稳健的结构先验信息。此外,在U-Net编码器-解码器架构中的每一层级上,我们都开发了一种称为CNN-Transformer混合结构指导特征提取器(HSGFE)的新模块。 除了擅长多尺度特征提取和融合的CNN块之外,我们还在每个HSGFE中引入了结构导向变换器块(SGTB),它将结构先验信息融入到增强过程中以调节图像增强过程。广泛的实验表明,我们的方法在多个LLIE基准测试中,在定量指标和视觉质量方面均达到了最先进的性能。 我们在NTIRE 2025低光增强挑战赛中排名第二。代码可在此URL获取:[此处应为具体链接,请访问原始发布来源以查看准确的链接地址]。

URL

https://arxiv.org/abs/2504.14075

PDF

https://arxiv.org/pdf/2504.14075.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot