Mamba

Reference: A Visual Guide to Mamba and State Space Models

🥥 Table of Content

Part 1: The Issues of Transformer

Part 2: State Space Model(SSM)

State Space Model(SSM)
HIPPO
Structured State Space for Sequences (S4)

Part 3: Mamba: A Selective SSM

A selective scan algorithm, which allows the model to filter (ir)relevant information
A hardware-aware algorithm that allows for efficient storage of (intermediate) results through parallel scan, kernel fusion, and recomputation.
Flash Attention
Mamba(Selective SSM, S6)
Vision Mamba
Cobra: Multimodal Mamba

🥑 Get Started!

Article 1: Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Article 2: Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Article 3: Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Article 4: MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Article 4: S4: Efficiently Modeling Long Sequences with Structured State Spaces
Blog 1: 一文通透想颠覆Transformer的Mamba：从SSM、HiPPO、S4到Mamba | CSDN
Blog 2: 通透理解FlashAttention与FlashAttention2：全面降低显存读写、加快计算速度
Video 1: 下个风口？Mamba手推公式&代码手搓 | Bilibili
Video 2: 【博士Vlog】2024最新模型Mamba详解，Transformer已死，你想知道的都在这里了！ | Bilibili
Video 3: 视觉十分钟｜mamba模型讲解（含transformer，RNN，SSM，S4部分） | Bilibili
Video 4: Mamba和S4解读：架构、并行扫描、内核融合、循环、卷积、数学 | Bilibili

State Space Model(SSM)

Model	Training Phase	Inference Phase	Addition Issue
RNN(1986)	Slow(not parallelizable)	Fast(scales linearly with sequence length)	Rapid Forgetting
LSTM(1997)	Slow	Fast	Forgetting
Transformer(2017)	Fast(parallelizable)	Slow(scales quadratically with sequence length)	Ram & Time: \(O(N^2)\)
Mamba(2023)	Fast	Fast(scales linearly with sequence length + unbounded context length)	Ram & Time: \(O(N)\)

\(h_t = Aht + B\)

Flash Attention

posted @ 2024-04-15 04:49 ForHHeart 阅读(25) 评论(0) 编辑收藏举报

ForHHeart

Mamba

🥥 Table of Content

🥑 Get Started!

公告