Mamba

Reference: A Visual Guide to Mamba and State Space Models

🥥 Table of Content

Part 1: The Issues of Transformer

Part 2: State Space Model(SSM)

Part 3: Mamba: A Selective SSM

🥑 Get Started!

Article 1: Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Article 2: Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Article 3: Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Article 4: MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Article 4: S4: Efficiently Modeling Long Sequences with Structured State Spaces
Blog 1: 一文通透想颠覆Transformer的Mamba:从SSM、HiPPO、S4到Mamba | CSDN
Blog 2: 通透理解FlashAttention与FlashAttention2:全面降低显存读写、加快计算速度
Video 1: 下个风口?Mamba手推公式&代码手搓 | Bilibili
Video 2: 【博士Vlog】2024最新模型Mamba详解,Transformer已死,你想知道的都在这里了! | Bilibili
Video 3: 视觉十分钟|mamba模型讲解(含transformer,RNN,SSM,S4部分) | Bilibili
Video 4: Mamba和S4解读:架构、并行扫描、内核融合、循环、卷积、数学 | Bilibili

State Space Model(SSM)

Model Training Phase Inference Phase Addition Issue
RNN(1986) Slow(not parallelizable) Fast(scales linearly with sequence length) Rapid Forgetting
LSTM(1997) Slow Fast Forgetting
Transformer(2017) Fast(parallelizable) Slow(scales quadratically with sequence length) Ram & Time: \(O(N^2)\)
Mamba(2023) Fast Fast(scales linearly with sequence length + unbounded context length) Ram & Time: \(O(N)\)

\(h_t = Aht + B\)

Flash Attention

posted @ 2024-04-15 04:49  ForHHeart  阅读(25)  评论(0编辑  收藏  举报