A Survey on Visual Mamba

Abstract
Abstract (translated)
URL
PDF

Abstract

State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.

Abstract (translated)

带有选择机制和硬件感知架构的状态空间模型（SSMs），如Mamba，在长序列建模方面最近取得了显著的进展。由于Transformer中自注意力机制的复杂性随着图像尺寸的增加而增加，计算机视觉任务的计算需求也在增加，因此研究人员现在正在探索如何将Mamba适应计算机视觉任务。本文是旨在为计算机视觉领域提供对Mamba模型的深入分析的第一篇全面调查。文章首先探讨了导致Mamba成功的基本概念，包括状态空间模型框架、选择机制和硬件感知设计。接下来，我们通过分类这些视觉Mamba模型为基本模型并使用卷积、递归和注意等技术对其进行改进，来回顾这些模型。我们深入探讨了Mamba在计算机视觉任务中的广泛应用，包括在各种级别视觉处理中的作为骨干的应用。这包括一般视觉任务（如物体检测、分割、分类和图像配准等）、医学视觉任务（如2D/3D分割、分类和图像配准等）和遥感视觉任务。我们特别引入了两个层面的通用视觉任务：高/中级别视觉（如物体检测、分割、视频分类等）和低级别视觉（如图像超分辨率、图像修复、视觉生成等）。我们希望这个努力将在社区中激发更多的兴趣，以解决当前的挑战并进一步将Mamba模型应用于计算机视觉。

URL

https://arxiv.org/abs/2404.15956

PDF

https://arxiv.org/pdf/2404.15956.pdf

A Survey on Visual Mamba

Abstract

Abstract (translated)

URL

PDF Copy

PDF