Paper Reading AI Learner

Rethinking Efficient Hierarchical Mixing Architecture for Low-light RAW Image Enhancement

2025-10-17 10:09:38
Xianmin Chen, Peiliang Huang, Longfei Han, Dingwen Zhang, Junwei Han

Abstract

Low-light RAW image enhancement remains a challenging task. Although numerous deep learning based approaches have been proposed, they still suffer from inherent limitations. A key challenge is how to simultaneously achieve strong enhancement quality and high efficiency. In this paper, we rethink the architecture for efficient low-light image signal processing (ISP) and introduce a Hierarchical Mixing Architecture (HiMA). HiMA leverages the complementary strengths of Transformer and Mamba modules to handle features at large and small scales, respectively, thereby improving efficiency while avoiding the ambiguities observed in prior two-stage frameworks. To further address uneven illumination with strong local variations, we propose Local Distribution Adjustment (LoDA), which adaptively aligns feature distributions across different local regions. In addition, to fully exploit the denoised outputs from the first stage, we design a Multi-prior Fusion (MPF) module that integrates spatial and frequency-domain priors for detail enhancement. Extensive experiments on multiple public datasets demonstrate that our method outperforms state-of-the-art approaches, achieving superior performance with fewer parameters. Code will be released at this https URL.

Abstract (translated)

低光RAW图像增强仍然是一个具有挑战性的任务。尽管已经提出了许多基于深度学习的方法,但它们仍然存在固有的局限性。主要的挑战是如何同时实现强大的增强质量和高效率。在本文中,我们重新思考了高效的低光图像信号处理(ISP)架构,并引入了一种分层混合架构(HiMA)。HiMA利用Transformer和Mamba模块的优势来分别处理大尺度和小尺度特征,从而提高效率并避免先前两阶段框架中存在的模糊性。为了解决强烈的局部变化引起的光照不均问题,我们提出了局部分布调整(LoDA),它能够自适应地对不同局部区域的特性分布进行校准。此外,为了充分利用第一阶段去噪后的输出,我们设计了一个多先验融合(MPF)模块,该模块集成了空间和频域先验以增强细节。在多个公共数据集上的广泛实验表明,我们的方法超越了现有的最先进技术,在参数更少的情况下实现了更好的性能。代码将在[提供链接]发布。

URL

https://arxiv.org/abs/2510.15497

PDF

https://arxiv.org/pdf/2510.15497.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot