Abstract
The use of multimodal data in assisted diagnosis and segmentation has emerged as a prominent area of interest in current research. However, one of the primary challenges is how to effectively fuse multimodal features. Most of the current approaches focus on the integration of multimodal features while ignoring the correlation and consistency between different modal features, leading to the inclusion of potentially irrelevant information. To address this issue, we introduce an innovative Multimodal Information Cross Transformer (MicFormer), which employs a dual-stream architecture to simultaneously extract features from each modality. Leveraging the Cross Transformer, it queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features. Additionally, we incorporate a deformable Transformer architecture to expand the search space. We conducted experiments on the MM-WHS dataset, and in the CT-MRI multimodal image segmentation task, we successfully improved the whole-heart segmentation DICE score to 85.57 and MIoU to 75.51. Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively. This demonstrates the efficacy of MicFormer in integrating relevant information between different modalities in multimodal tasks. These findings hold significant implications for multimodal image tasks, and we believe that MicFormer possesses extensive potential for broader applications across various domains. Access to our method is available at this https URL
Abstract (translated)
在当前的研究中,多模态数据在辅助诊断和分割中的应用已成为一个突出的研究领域。然而,一个主要挑战是如何有效地融合多模态特征。目前的大多数方法都关注于多模态特征的整合,而忽略了不同模态特征之间的相关性和一致性,导致包含可能无关的信息。为解决这个问题,我们引入了一种创新的多模态信息交叉变换(MicFormer)方法,它采用双流架构同时提取每个模态的特征。利用交叉变换,它从一个模态提取特征并从另一个模态检索相应的响应,促进不同模态特征之间的有效沟通。此外,我们还引入了一个可变的Transformer架构来扩展搜索空间。我们在MM-WHS数据集上进行了实验,并在CT-MRI多模态图像分割任务中成功将整体心分割DICE得分提高至85.57,MIoU至75.51。与其他多模态分割技术相比,我们的方法在优势上明显超过2.83和4.23。这些发现对于多模态图像任务具有重要的意义,我们相信MicFormer在各种领域的广泛应用具有极大的潜力。您可以通过以下链接访问我们的方法:https://www. researchgate.net/publication/333003621_Multimodal_Information_Cross_Transformer
URL
https://arxiv.org/abs/2404.16371