Abstract
High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on restoring structural details or global characteristics of fundus images, lacking a unified image enhancement framework to recover comprehensive multi-scale information. Moreover, few methods pinpoint the target of image enhancement, e.g., lesions, which is crucial for medical image-based diagnosis. To address these challenges, we propose a multi-scale target-aware representation learning framework (MTRL-FIE) for efficient fundus image enhancement. Specifically, we propose a multi-scale feature encoder (MFE) that employs wavelet decomposition to embed both low-frequency structural information and high-frequency details. Next, we design a structure-preserving hierarchical decoder (SHD) to fuse multi-scale feature embeddings for real fundus image restoration. SHD integrates hierarchical fusion and group attention mechanisms to achieve adaptive feature fusion while retaining local structural smoothness. Meanwhile, a target-aware feature aggregation (TFA) module is used to enhance pathological regions and reduce artifacts. Experimental results on multiple fundus image datasets demonstrate the effectiveness and generalizability of MTRL-FIE for fundus image enhancement. Compared to state-of-the-art methods, MTRL-FIE achieves superior enhancement performance with a more lightweight architecture. Furthermore, our approach generalizes to other ophthalmic image processing tasks without supervised fine-tuning, highlighting its potential for clinical applications.
Abstract (translated)
高质量的眼底图像为临床筛查和眼科疾病诊断提供了重要的解剖信息。然而,由于硬件限制、操作变异性以及患者的配合度问题,眼底图像常常会受到分辨率低和信噪比差的影响。近年来,在眼底图像增强方面取得了显著进展,但现有的研究工作通常专注于恢复眼底图像的结构细节或整体特征,缺乏一种统一的眼底图像增强框架来全面恢复多尺度信息。此外,很少有方法明确指出图像增强的目标(如病变),这对于基于医学图像的诊断至关重要。为了应对这些挑战,我们提出了一种多尺度目标感知表示学习框架(MTRL-FIE)用于高效眼底图像增强。 具体而言,我们提出了一个多尺度特征编码器(MFE),该编码器利用小波分解嵌入低频结构信息和高频细节。接下来,我们设计了一个保持结构的分层解码器(SHD),以融合多尺度特征嵌入实现真实眼底图像恢复。SHD结合了层次化融合与组注意力机制来实现自适应特征融合的同时保留局部结构平滑度。同时,一个目标感知特征聚合(TFA)模块被用来增强病理区域并减少伪影。 在多个眼底图像数据集上的实验结果表明,MTRL-FIE在眼底图像增强的有效性和泛化性方面均表现出色。与最先进的方法相比,MTRL-FIE实现了更优的增强性能,并且具有更为轻量级的架构。此外,我们的方法能够推广至其他眼科影像处理任务而不需监督微调,凸显了其应用于临床诊断中的潜力。
URL
https://arxiv.org/abs/2505.01831