Paper Reading AI Learner

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

2024-04-15 06:37:21
Xinyu Xie, Yawen Cui, Chio-In Ieong, Tao Tan, Xiaozhi Zhang, Xubin Zheng, Zitong Yu

Abstract

Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. Recently, the Selective Structured State Space Model has exhibited significant potential for long-range dependency modeling with linear complexity, offering a promising avenue to address the aforementioned dilemma. In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This refined model not only upholds the performance of Mamba and global modeling capability but also diminishes channel redundancy while enhancing local enhancement capability. Additionally, we devise a dynamic feature fusion module (DFFM) comprising two dynamic feature enhancement modules (DFEM) and a cross modality fusion mamba module (CMFM). The former serves for dynamic texture enhancement and dynamic difference perception, whereas the latter enhances correlation features between modes and suppresses redundant intermodal information. FusionMamba has yielded state-of-the-art (SOTA) performance across various multimodal medical image fusion tasks (CT-MRI, PET-MRI, SPECT-MRI), infrared and visible image fusion task (IR-VIS) and multimodal biomedical image fusion dataset (GFP-PC), which is proved that our model has generalization ability. The code for FusionMamba is available at this https URL.

Abstract (translated)

多模态图像融合旨在将不同模式的信息结合在一起,以创建具有全面信息和详细纹理的单张图像。然而,基于卷积神经网络的融合模型由于集中于局部卷积操作,在捕捉全局图像特征方面遇到了限制。Transformer-based模型虽然在全局特征建模方面表现优异,但由于其四元复杂度,面临着计算挑战。最近,基于选择性结构状态空间模型的长距离依赖建模已经表现出很大的潜力,为解决上述问题提供了一个有前途的途径。在本文中,我们提出了FusionMamba,一种新颖的多模态图像融合动态特征增强方法,与Mamba相结合。具体来说,我们设计了一个改进高效的Mamba图像融合模型,将高效的视觉状态空间模型与动态卷积和通道关注相结合。这个平滑的模型不仅保持了Mamba和全局建模能力,还减少了通道冗余,增强了局部增强能力。此外,我们还设计了一个动态特征融合模块(DFFM),包括两个动态特征增强模块(DFEM)和一种跨模态融合Mamba模块(CMFM)。前者用于动态纹理增强和动态差异感知,而后者用于模式之间的相关特征,并抑制冗余的跨模态信息。FusionMamba在各种多模态医学图像融合任务(CT-MRI,PET-MRI,SPECT-MRI)和非热成像和可见图像融合任务(IR-VIS)以及多模态生物医学图像融合数据集(GFP-PC)中均取得了最先进的(SOTA)性能,证明了我们的模型具有良好的泛化能力。FusionMamba的代码可以从该https URL获得。

URL

https://arxiv.org/abs/2404.09498

PDF

https://arxiv.org/pdf/2404.09498.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot