Paper Reading AI Learner

Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement

2025-05-03 14:25:48
Haofan Wu, Yin Huang, Yuqing Wu, Qiuyu Yang, Bingfang Wang, Li Zhang, Muhammad Fahadullah Khan, Ali Zia, M. Saleh Memon, Syed Sohail Bukhari, Abdul Fattah Memon, Daizong Ji, Ya Zhang, Ghulam Mustafa, Yin Fang

Abstract

High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on restoring structural details or global characteristics of fundus images, lacking a unified image enhancement framework to recover comprehensive multi-scale information. Moreover, few methods pinpoint the target of image enhancement, e.g., lesions, which is crucial for medical image-based diagnosis. To address these challenges, we propose a multi-scale target-aware representation learning framework (MTRL-FIE) for efficient fundus image enhancement. Specifically, we propose a multi-scale feature encoder (MFE) that employs wavelet decomposition to embed both low-frequency structural information and high-frequency details. Next, we design a structure-preserving hierarchical decoder (SHD) to fuse multi-scale feature embeddings for real fundus image restoration. SHD integrates hierarchical fusion and group attention mechanisms to achieve adaptive feature fusion while retaining local structural smoothness. Meanwhile, a target-aware feature aggregation (TFA) module is used to enhance pathological regions and reduce artifacts. Experimental results on multiple fundus image datasets demonstrate the effectiveness and generalizability of MTRL-FIE for fundus image enhancement. Compared to state-of-the-art methods, MTRL-FIE achieves superior enhancement performance with a more lightweight architecture. Furthermore, our approach generalizes to other ophthalmic image processing tasks without supervised fine-tuning, highlighting its potential for clinical applications.

Abstract (translated)

高质量的眼底图像为临床筛查和眼科疾病诊断提供了重要的解剖信息。然而,由于硬件限制、操作变异性以及患者的配合度问题,眼底图像常常会受到分辨率低和信噪比差的影响。近年来,在眼底图像增强方面取得了显著进展,但现有的研究工作通常专注于恢复眼底图像的结构细节或整体特征,缺乏一种统一的眼底图像增强框架来全面恢复多尺度信息。此外,很少有方法明确指出图像增强的目标(如病变),这对于基于医学图像的诊断至关重要。为了应对这些挑战,我们提出了一种多尺度目标感知表示学习框架(MTRL-FIE)用于高效眼底图像增强。 具体而言,我们提出了一个多尺度特征编码器(MFE),该编码器利用小波分解嵌入低频结构信息和高频细节。接下来,我们设计了一个保持结构的分层解码器(SHD),以融合多尺度特征嵌入实现真实眼底图像恢复。SHD结合了层次化融合与组注意力机制来实现自适应特征融合的同时保留局部结构平滑度。同时,一个目标感知特征聚合(TFA)模块被用来增强病理区域并减少伪影。 在多个眼底图像数据集上的实验结果表明,MTRL-FIE在眼底图像增强的有效性和泛化性方面均表现出色。与最先进的方法相比,MTRL-FIE实现了更优的增强性能,并且具有更为轻量级的架构。此外,我们的方法能够推广至其他眼科影像处理任务而不需监督微调,凸显了其应用于临床诊断中的潜力。

URL

https://arxiv.org/abs/2505.01831

PDF

https://arxiv.org/pdf/2505.01831.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot