Paper Reading AI Learner

When Medical Imaging Met Self-Attention: A Love Story That Didn't Quite Work Out

2024-04-18 16:18:41
Tristan Piater, Niklas Penzel, Gideon Stein, Joachim Denzler

Abstract

A substantial body of research has focused on developing systems that assist medical professionals during labor-intensive early screening processes, many based on convolutional deep-learning architectures. Recently, multiple studies explored the application of so-called self-attention mechanisms in the vision domain. These studies often report empirical improvements over fully convolutional approaches on various datasets and tasks. To evaluate this trend for medical imaging, we extend two widely adopted convolutional architectures with different self-attention variants on two different medical datasets. With this, we aim to specifically evaluate the possible advantages of additional self-attention. We compare our models with similarly sized convolutional and attention-based baselines and evaluate performance gains statistically. Additionally, we investigate how including such layers changes the features learned by these models during the training. Following a hyperparameter search, and contrary to our expectations, we observe no significant improvement in balanced accuracy over fully convolutional models. We also find that important features, such as dermoscopic structures in skin lesion images, are still not learned by employing self-attention. Finally, analyzing local explanations, we confirm biased feature usage. We conclude that merely incorporating attention is insufficient to surpass the performance of existing fully convolutional methods.

Abstract (translated)

大量的研究集中在开发医疗专业人员劳动密集型早期筛查系统上,许多基于卷积深度学习架构。最近,多个研究探讨了在视觉领域应用所谓的自注意力机制。这些研究通常报告在各种数据集和任务上的实验改进。为了评估这一趋势在医学影像方面的效果,我们在两个不同的医疗数据集上扩展了两种广泛采用的卷积架构,采用不同的自注意力变体。这样,我们旨在特别评估附加自注意力的可能优势。我们比较了我们的模型与同样大小的卷积和关注基础的基线模型,并统计性能增长。此外,我们还研究了包括这些层的模型在训练过程中学习的特征的变化。在超参数搜索之后,与我们的预期相反,我们观察到平衡精度没有显著提高。我们也发现,使用自注意力机制学习的重要特征,如皮肤病变图像中的病理结构,仍然没有得到学习。最后,通过分析局部解释,我们证实了这种自注意力模型的偏见特征使用方式。我们得出结论,仅通过引入注意力和其不足之处,无法超越现有完全卷积方法的性能。

URL

https://arxiv.org/abs/2404.12295

PDF

https://arxiv.org/pdf/2404.12295.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot