Abstract
Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. Recently, the Selective Structured State Space Model has exhibited significant potential for long-range dependency modeling with linear complexity, offering a promising avenue to address the aforementioned dilemma. In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This refined model not only upholds the performance of Mamba and global modeling capability but also diminishes channel redundancy while enhancing local enhancement capability. Additionally, we devise a dynamic feature fusion module (DFFM) comprising two dynamic feature enhancement modules (DFEM) and a cross modality fusion mamba module (CMFM). The former serves for dynamic texture enhancement and dynamic difference perception, whereas the latter enhances correlation features between modes and suppresses redundant intermodal information. FusionMamba has yielded state-of-the-art (SOTA) performance across various multimodal medical image fusion tasks (CT-MRI, PET-MRI, SPECT-MRI), infrared and visible image fusion task (IR-VIS) and multimodal biomedical image fusion dataset (GFP-PC), which is proved that our model has generalization ability. The code for FusionMamba is available at this https URL.
Abstract (translated)
多模态图像融合旨在将不同模式的信息结合在一起,以创建具有全面信息和详细纹理的单张图像。然而,基于卷积神经网络的融合模型由于集中于局部卷积操作,在捕捉全局图像特征方面遇到了限制。Transformer-based模型虽然在全局特征建模方面表现优异,但由于其四元复杂度,面临着计算挑战。最近,基于选择性结构状态空间模型的长距离依赖建模已经表现出很大的潜力,为解决上述问题提供了一个有前途的途径。在本文中,我们提出了FusionMamba,一种新颖的多模态图像融合动态特征增强方法,与Mamba相结合。具体来说,我们设计了一个改进高效的Mamba图像融合模型,将高效的视觉状态空间模型与动态卷积和通道关注相结合。这个平滑的模型不仅保持了Mamba和全局建模能力,还减少了通道冗余,增强了局部增强能力。此外,我们还设计了一个动态特征融合模块(DFFM),包括两个动态特征增强模块(DFEM)和一种跨模态融合Mamba模块(CMFM)。前者用于动态纹理增强和动态差异感知,而后者用于模式之间的相关特征,并抑制冗余的跨模态信息。FusionMamba在各种多模态医学图像融合任务(CT-MRI,PET-MRI,SPECT-MRI)和非热成像和可见图像融合任务(IR-VIS)以及多模态生物医学图像融合数据集(GFP-PC)中均取得了最先进的(SOTA)性能,证明了我们的模型具有良好的泛化能力。FusionMamba的代码可以从该https URL获得。
URL
https://arxiv.org/abs/2404.09498