Abstract
The Diffusion Probabilistic Model (DPM) has recently gained popularity in the field of computer vision, thanks to its image generation applications, such as Imagen, Latent Diffusion Models, and Stable Diffusion, which have demonstrated impressive capabilities and sparked much discussion within the community. Recent studies have also found DPM to be useful in the field of medical image analysis, as evidenced by the strong performance of the medical image segmentation model MedSegDiff in various tasks. While these models were originally designed with a UNet backbone, they may also potentially benefit from the incorporation of vision transformer techniques. However, we discovered that simply combining these two approaches resulted in subpar performance. In this paper, we propose a novel transformer-based conditional UNet framework, as well as a new Spectrum-Space Transformer (SS-Former) to model the interaction between noise and semantic features. This architectural improvement leads to a new diffusion-based medical image segmentation method called MedSegDiff-V2, which significantly improves the performance of MedSegDiff. We have verified the effectiveness of MedSegDiff-V2 on eighteen organs of five segmentation datasets with different image modalities. Our experimental results demonstrate that MedSegDiff-V2 outperforms state-of-the-art (SOTA) methods by a considerable margin, further proving the generalizability and effectiveness of the proposed model.
Abstract (translated)
扩散概率模型(DPM)最近在计算机视觉领域变得越来越流行,由于其生成应用,例如Imagen、隐扩散模型(Latent Diffusion Model)和稳定扩散(Stable Diffusion),已经在社区中表现出令人印象深刻的能力并引起了广泛的讨论。最近的研究还发现DPM在医疗图像分析领域很有用,因为MedSegDiff在多种任务中表现出强劲的性能。尽管这些模型最初是与UNet主链一起设计的,但它们可能也可能从视觉Transformer技术中受益。然而,我们发现,简单地将这两个方法结合在一起会导致表现不佳。在本文中,我们提出了一种新的Transformer基条件的UNet框架,以及一个新的谱空间Transformer(SS- Former),以建模噪声与语义特征之间的相互作用。这种建筑改进导致了一个新的扩散为基础的医疗图像分割方法,名为MedSegDiff-V2,它显著提高了MedSegDiff的性能。我们已验证MedSegDiff-V2在五个分割数据集的不同图像模式中的18个器官中的有效性。我们的实验结果表明,MedSegDiff-V2比最先进的方法(SOTA)表现更好,进一步证明了我们提出模型的可泛性和有效性。
URL
https://arxiv.org/abs/2301.11798