Paper Reading AI Learner

Multimodal Information Interaction for Medical Image Segmentation

2024-04-25 07:21:14
Xinxin Fan, Lin Liu, Haoran Zhang

Abstract

The use of multimodal data in assisted diagnosis and segmentation has emerged as a prominent area of interest in current research. However, one of the primary challenges is how to effectively fuse multimodal features. Most of the current approaches focus on the integration of multimodal features while ignoring the correlation and consistency between different modal features, leading to the inclusion of potentially irrelevant information. To address this issue, we introduce an innovative Multimodal Information Cross Transformer (MicFormer), which employs a dual-stream architecture to simultaneously extract features from each modality. Leveraging the Cross Transformer, it queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features. Additionally, we incorporate a deformable Transformer architecture to expand the search space. We conducted experiments on the MM-WHS dataset, and in the CT-MRI multimodal image segmentation task, we successfully improved the whole-heart segmentation DICE score to 85.57 and MIoU to 75.51. Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively. This demonstrates the efficacy of MicFormer in integrating relevant information between different modalities in multimodal tasks. These findings hold significant implications for multimodal image tasks, and we believe that MicFormer possesses extensive potential for broader applications across various domains. Access to our method is available at this https URL

Abstract (translated)

在当前的研究中,多模态数据在辅助诊断和分割中的应用已成为一个突出的研究领域。然而,一个主要挑战是如何有效地融合多模态特征。目前的大多数方法都关注于多模态特征的整合,而忽略了不同模态特征之间的相关性和一致性,导致包含可能无关的信息。为解决这个问题,我们引入了一种创新的多模态信息交叉变换(MicFormer)方法,它采用双流架构同时提取每个模态的特征。利用交叉变换,它从一个模态提取特征并从另一个模态检索相应的响应,促进不同模态特征之间的有效沟通。此外,我们还引入了一个可变的Transformer架构来扩展搜索空间。我们在MM-WHS数据集上进行了实验,并在CT-MRI多模态图像分割任务中成功将整体心分割DICE得分提高至85.57,MIoU至75.51。与其他多模态分割技术相比,我们的方法在优势上明显超过2.83和4.23。这些发现对于多模态图像任务具有重要的意义,我们相信MicFormer在各种领域的广泛应用具有极大的潜力。您可以通过以下链接访问我们的方法:https://www. researchgate.net/publication/333003621_Multimodal_Information_Cross_Transformer

URL

https://arxiv.org/abs/2404.16371

PDF

https://arxiv.org/pdf/2404.16371.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot