Abstract
A substantial body of research has focused on developing systems that assist medical professionals during labor-intensive early screening processes, many based on convolutional deep-learning architectures. Recently, multiple studies explored the application of so-called self-attention mechanisms in the vision domain. These studies often report empirical improvements over fully convolutional approaches on various datasets and tasks. To evaluate this trend for medical imaging, we extend two widely adopted convolutional architectures with different self-attention variants on two different medical datasets. With this, we aim to specifically evaluate the possible advantages of additional self-attention. We compare our models with similarly sized convolutional and attention-based baselines and evaluate performance gains statistically. Additionally, we investigate how including such layers changes the features learned by these models during the training. Following a hyperparameter search, and contrary to our expectations, we observe no significant improvement in balanced accuracy over fully convolutional models. We also find that important features, such as dermoscopic structures in skin lesion images, are still not learned by employing self-attention. Finally, analyzing local explanations, we confirm biased feature usage. We conclude that merely incorporating attention is insufficient to surpass the performance of existing fully convolutional methods.
Abstract (translated)
大量的研究集中在开发医疗专业人员劳动密集型早期筛查系统上,许多基于卷积深度学习架构。最近,多个研究探讨了在视觉领域应用所谓的自注意力机制。这些研究通常报告在各种数据集和任务上的实验改进。为了评估这一趋势在医学影像方面的效果,我们在两个不同的医疗数据集上扩展了两种广泛采用的卷积架构,采用不同的自注意力变体。这样,我们旨在特别评估附加自注意力的可能优势。我们比较了我们的模型与同样大小的卷积和关注基础的基线模型,并统计性能增长。此外,我们还研究了包括这些层的模型在训练过程中学习的特征的变化。在超参数搜索之后,与我们的预期相反,我们观察到平衡精度没有显著提高。我们也发现,使用自注意力机制学习的重要特征,如皮肤病变图像中的病理结构,仍然没有得到学习。最后,通过分析局部解释,我们证实了这种自注意力模型的偏见特征使用方式。我们得出结论,仅通过引入注意力和其不足之处,无法超越现有完全卷积方法的性能。
URL
https://arxiv.org/abs/2404.12295