Paper Reading AI Learner

MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer

2023-01-19 03:42:36
Junde Wu, Rao Fu, Huihui Fang, Yu Zhang, Yanwu Xu

Abstract

The Diffusion Probabilistic Model (DPM) has recently gained popularity in the field of computer vision, thanks to its image generation applications, such as Imagen, Latent Diffusion Models, and Stable Diffusion, which have demonstrated impressive capabilities and sparked much discussion within the community. Recent studies have also found DPM to be useful in the field of medical image analysis, as evidenced by the strong performance of the medical image segmentation model MedSegDiff in various tasks. While these models were originally designed with a UNet backbone, they may also potentially benefit from the incorporation of vision transformer techniques. However, we discovered that simply combining these two approaches resulted in subpar performance. In this paper, we propose a novel transformer-based conditional UNet framework, as well as a new Spectrum-Space Transformer (SS-Former) to model the interaction between noise and semantic features. This architectural improvement leads to a new diffusion-based medical image segmentation method called MedSegDiff-V2, which significantly improves the performance of MedSegDiff. We have verified the effectiveness of MedSegDiff-V2 on eighteen organs of five segmentation datasets with different image modalities. Our experimental results demonstrate that MedSegDiff-V2 outperforms state-of-the-art (SOTA) methods by a considerable margin, further proving the generalizability and effectiveness of the proposed model.

Abstract (translated)

扩散概率模型(DPM)最近在计算机视觉领域变得越来越流行,由于其生成应用,例如Imagen、隐扩散模型(Latent Diffusion Model)和稳定扩散(Stable Diffusion),已经在社区中表现出令人印象深刻的能力并引起了广泛的讨论。最近的研究还发现DPM在医疗图像分析领域很有用,因为MedSegDiff在多种任务中表现出强劲的性能。尽管这些模型最初是与UNet主链一起设计的,但它们可能也可能从视觉Transformer技术中受益。然而,我们发现,简单地将这两个方法结合在一起会导致表现不佳。在本文中,我们提出了一种新的Transformer基条件的UNet框架,以及一个新的谱空间Transformer(SS- Former),以建模噪声与语义特征之间的相互作用。这种建筑改进导致了一个新的扩散为基础的医疗图像分割方法,名为MedSegDiff-V2,它显著提高了MedSegDiff的性能。我们已验证MedSegDiff-V2在五个分割数据集的不同图像模式中的18个器官中的有效性。我们的实验结果表明,MedSegDiff-V2比最先进的方法(SOTA)表现更好,进一步证明了我们提出模型的可泛性和有效性。

URL

https://arxiv.org/abs/2301.11798

PDF

https://arxiv.org/pdf/2301.11798.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot