Abstract
Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications. Therefore, it is desired to design a light-weight network with high performance for retinal layer segmentation. In this paper, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part employs multi-scale feature extraction and a Transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multi-scale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared to the current state-of-the-art method TransUnet with 105.7M parameters on both our collected dataset and two other public datasets, with only 3.3M parameters.
Abstract (translated)
自动眼层分割医疗图像,如光学共焦断层扫描(OCT)图像,是诊断眼部疾病的重要工具。然而,由于图像中呈现低对比度和血流噪声,实现准确分割具有挑战性。此外,算法应该轻便,以便在实际临床应用中部署。因此,我们希望设计一个轻量高性能的眼层分割网络。在本文中,我们提出了LightReSeg用于眼层分割,可以应用于OCT图像。具体来说,我们的方法遵循编码器-解码器结构,其中编码器部分采用多尺度特征提取和Transformer模块,以充分利用所有尺度下的特征图的语义信息,并使特征具有更好的全局推理能力,而解码器部分,我们设计了一个多尺度非对称注意力(MAA)模块,用于保留每个编码器尺度下的语义信息。实验证明,我们的方法在两个公开数据集上的性能优于当前最先进的方法TransUnet,具有更少的参数,分别为105.7M和2M。
URL
https://arxiv.org/abs/2404.16346