Paper Reading AI Learner

FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement

2025-04-27 16:22:03
Kangbiao Shi, Yixu Feng, Tao Hu, Yu Cao, Peng Wu, Yijin Liang, Yanning Zhang, Qingsen Yan

Abstract

The advent of Deep Neural Networks (DNNs) has driven remarkable progress in low-light image enhancement (LLIE), with diverse architectures (e.g., CNNs and Transformers) and color spaces (e.g., sRGB, HSV, HVI) yielding impressive results. Recent efforts have sought to leverage the complementary strengths of these paradigms, offering promising solutions to enhance performance across varying degradation scenarios. However, existing fusion strategies are hindered by challenges such as parameter explosion, optimization instability, and feature misalignment, limiting further improvements. To overcome these issues, we introduce FusionNet, a novel multi-model linear fusion framework that operates in parallel to effectively capture global and local features across diverse color spaces. By incorporating a linear fusion strategy underpinned by Hilbert space theoretical guarantees, FusionNet mitigates network collapse and reduces excessive training costs. Our method achieved 1st place in the CVPR2025 NTIRE Low Light Enhancement Challenge. Extensive experiments conducted on synthetic and real-world benchmark datasets demonstrate that the proposed method significantly outperforms state-of-the-art methods in terms of both quantitative and qualitative results, delivering robust enhancement under diverse low-light conditions.

Abstract (translated)

深度神经网络(DNN)的出现推动了低光图像增强(LLIE)领域的显著进步,各种架构(如卷积神经网络CNN和Transformer)以及不同的颜色空间(如sRGB、HSV、HVI)都产生了令人印象深刻的结果。近期的研究努力旨在利用这些范式的优势来提高在不同退化场景下的性能表现。然而,现有的融合策略面临着参数爆炸、优化不稳定性以及特征错位等问题的挑战,从而限制了进一步的进步。 为了解决这些问题,我们引入了一个新的多模型线性融合框架FusionNet,它能够在不同的颜色空间中并行操作以有效捕捉全局和局部特征。通过采用基于希尔伯特空间理论保证的线性融合策略,FusionNet减轻了网络崩溃,并减少了过高的训练成本。我们的方法在CVPR2025 NTIRE低光增强挑战赛中获得了第一名的成绩。 我们在合成数据集和真实世界基准测试数据集上进行了广泛的实验,结果表明所提出的方法在定量和定性评估方面均显著超越现有的最先进方法,在各种低光照条件下提供了稳健的图像增强效果。

URL

https://arxiv.org/abs/2504.19295

PDF

https://arxiv.org/pdf/2504.19295.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot