Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Abstract
Abstract (translated)
URL
PDF

Abstract

Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{this https URL}.

Abstract (translated)

序列建模是一个贯穿各种领域的关键领域，包括自然语言处理（NLP）、语音识别、时间序列预测、音乐生成和生物信息学。递归神经网络（RNNs）和长短时记忆网络（LSTMs）历史上曾统治序列建模任务，如机器翻译、命名实体识别等。然而，Transformer的进步导致了一种范式的转移，由于它们在性能上的优越表现。然而，Transformer的注意力复杂性和处理归纳偏差的能力仍然存在挑战。为解决这些问题，已经提出了几种变体，包括使用特征网络或卷积的模型，并在各种任务上表现良好。然而，它们仍然很难处理长序列。状态空间模型（SSMs）在这一背景下出现了有前景的替代方案，尤其是S4和其变体，如S4nd、Hippo、Hyena、诊断状态空间（DSS）、Gated State Spaces（GSS）和Linear Recurrent Unit（LRU）、Liquid-S4、Mamba等。在本次调查中，我们根据三种范式对基本SSMs进行了分类，即开关架构、结构架构和循环架构。本调查还强调了SSMs在各个领域的多样化应用，如视觉、视频、音频、语音、语言（特别是长序列建模）、医学（包括基因组学）、化学（如药物设计）和推荐系统，以及时间序列分析，包括表格数据。此外，我们还分析了SSMs在基准数据集，如Long Range Arena（LRA）、WikiText、Glue、Pile、ImageNet、Kinetics-400、sstv2，以及视频数据集，如Breakfast、COIN、LVU等。Mamba-360工作的项目页面可以在该网页上查看。

URL

https://arxiv.org/abs/2404.16112

PDF

https://arxiv.org/pdf/2404.16112.pdf

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Abstract

Abstract (translated)

URL

PDF Copy

PDF