Paper Reading AI Learner

GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Volumetric Semantic Segmentation

2024-04-27 10:18:55
Ziya Ata Yazıcı, İlkay Öksüz, Hazım Kemal Ekenel

Abstract

Convolutional Neural Networks (CNNs) have become widely adopted for medical image segmentation tasks, demonstrating promising performance. However, the inherent inductive biases in convolutional architectures limit their ability to model long-range dependencies and spatial correlations. While recent transformer-based architectures address these limitations by leveraging self-attention mechanisms to encode long-range dependencies and learn expressive representations, they often struggle to extract low-level features and are highly dependent on data availability. This motivated us for the development of GLIMS, a data-efficient attention-guided hybrid volumetric segmentation network. GLIMS utilizes Dilated Feature Aggregator Convolutional Blocks (DACB) to capture local-global feature correlations efficiently. Furthermore, the incorporated Swin Transformer-based bottleneck bridges the local and global features to improve the robustness of the model. Additionally, GLIMS employs an attention-guided segmentation approach through Channel and Spatial-Wise Attention Blocks (CSAB) to localize expressive features for fine-grained border segmentation. Quantitative and qualitative results on glioblastoma and multi-organ CT segmentation tasks demonstrate GLIMS' effectiveness in terms of complexity and accuracy. GLIMS demonstrated outstanding performance on BraTS2021 and BTCV datasets, surpassing the performance of Swin UNETR. Notably, GLIMS achieved this high performance with a significantly reduced number of trainable parameters. Specifically, GLIMS has 47.16M trainable parameters and 72.30G FLOPs, while Swin UNETR has 61.98M trainable parameters and 394.84G FLOPs. The code is publicly available on this https URL.

Abstract (translated)

卷积神经网络(CNNs)在医学图像分割任务中已经广泛应用,展示了良好的性能。然而,卷积架构固有的归纳偏见限制了它们建模长距离依赖和空间关联的能力。虽然最近基于自注意力机制的Transformer架构通过编码长距离依赖和学习富有表现力的表示来解决这些限制,但它们往往难以提取低层次特征,对数据可用性高度依赖。这促使我们开发了GLIMS,一种数据高效的关注引导混合体积分割网络。GLIMS利用Dilated Feature Aggregator卷积块(DACB)来捕捉局部到全局特征关联。此外,GLIMS通过Channel和Spatial-Wise Attention Blocks(CSAB)实现关注引导分割,对细粒度边界分割进行表达式特征定位。在胶质瘤和多器官CT分割任务上的定量和定性结果表明GLIMS在复杂性和准确性方面的出色表现。GLIMS在BraTS2021和BTCV数据集上的表现超过Swin UNETR,值得注意的是,GLIMS以显著减少的训练参数取得了这种高性能。具体来说,GLIMS有47.16M个可训练参数和72.30G FLOPs,而Swin UNETR有61.98M个可训练参数和394.84G FLOPs。代码公开可用在https://这个URL上。

URL

https://arxiv.org/abs/2404.17854

PDF

https://arxiv.org/pdf/2404.17854.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot