Paper Reading AI Learner

S$^2$Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification

2024-04-28 15:12:56
Guanchun Wang, Xiangrong Zhang, Zelin Peng, Tianyang Zhang, Xiuping Jia, Licheng Jiao

Abstract

Land cover analysis using hyperspectral images (HSI) remains an open problem due to their low spatial resolution and complex spectral information. Recent studies are primarily dedicated to designing Transformer-based architectures for spatial-spectral long-range dependencies modeling, which is computationally expensive with quadratic complexity. Selective structured state space model (Mamba), which is efficient for modeling long-range dependencies with linear complexity, has recently shown promising progress. However, its potential in hyperspectral image processing that requires handling numerous spectral bands has not yet been explored. In this paper, we innovatively propose S$^2$Mamba, a spatial-spectral state space model for hyperspectral image classification, to excavate spatial-spectral contextual features, resulting in more efficient and accurate land cover analysis. In S$^2$Mamba, two selective structured state space models through different dimensions are designed for feature extraction, one for spatial, and the other for spectral, along with a spatial-spectral mixture gate for optimal fusion. More specifically, S$^2$Mamba first captures spatial contextual relations by interacting each pixel with its adjacent through a Patch Cross Scanning module and then explores semantic information from continuous spectral bands through a Bi-directional Spectral Scanning module. Considering the distinct expertise of the two attributes in homogenous and complicated texture scenes, we realize the Spatial-spectral Mixture Gate by a group of learnable matrices, allowing for the adaptive incorporation of representations learned across different dimensions. Extensive experiments conducted on HSI classification benchmarks demonstrate the superiority and prospect of S$^2$Mamba. The code will be available at: this https URL.

Abstract (translated)

使用 hyperspectral 图像(HSI)进行土地覆盖分析仍然是一个开放问题,因为它们的低空间分辨率和高复杂度。最近的研究主要集中在设计基于 Transformer 的空间-光谱长距离依赖关系建模架构,这具有计算上的平方复杂度。选择性结构化状态空间模型(Mamba)在模擬长距离依赖关系方面具有较低的复杂度,并且最近已经展示了有希望的前进方向。然而,它对超分辨率图像处理所需的空间-光谱上下文特征的潜在能力尚未被探索。在本文中,我们创新性地提出了 S$^2$Mamba,一种用于超分辨率图像分类的空间-光谱状态空间模型,以揭示空间-光谱上下文特征,从而实现更高效和准确的陆地覆盖分析。在 S$^2$Mamba 中,通过不同的维度设计两个选择性结构化状态空间模型进行特征提取,一个用于空间信息,另一个用于光谱信息,并使用空间-光谱混合门进行最优融合。具体来说,S$^2$Mamba 首先通过 Patch Cross Scanning 模块与相邻像素相互作用,捕捉空间上下文关系,然后通过 Bi-Directional Spectral Scanning 模块探索连续光谱带中的语义信息。考虑到两种属性在均匀和复杂纹理场景中的独特 expertise,我们通过一组可学习矩阵实现了空间-光谱混合门,允许在不同维度上学习到的表示的适应性整合。在 HSI 分类基准上进行的大量实验证明 S$^2$Mamba 的优越性和前景。代码将在此处提供:https:// this URL。

URL

https://arxiv.org/abs/2404.18213

PDF

https://arxiv.org/pdf/2404.18213.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot