Abstract
An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at this https URL.
Abstract (translated)
高效的有效的解码机制在医学图像分割中至关重要,尤其是在计算资源有限的情况下。然而,这些解码机制通常伴随着高昂的计算成本。为了应对这一担忧,我们引入了EMCAD,一种新型高效多尺度卷积注意解码器,旨在同时提高性能和计算效率。EMCAD利用独特的多尺度深度卷积模块,通过多尺度卷积显著增强特征图。EMCAD还采用通道、空间和分组(大核)卷积注意力机制,这些机制在捕捉复杂的空间关系的同时,专注于突出区域。通过采用分组和深度卷积,EMCAD非常高效,并且具有良好的扩展性(例如,使用标准编码器时,只需1.91M参数和0.381G FLOPs)。我们在六个医学图像分割任务上进行严格的评估发现,EMCAD在分别实现最佳性能(SOTA)和最佳计算效率(CPU效率和GFLOP效率)方面取得了显著优势。此外,EMCAD对不同编码器具有适应性,在分割任务上的多样性进一步证明EMCAD是一种有前景的工具,促进了该领域的更高效和精确的医学图像分析。我们的实现可以从以下链接获得:https://www.emcad.cn/。
URL
https://arxiv.org/abs/2405.06880