DeepSeek随笔

V3论文: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

R1论文: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

V3 解读readout:

V3为了提高效率降低成本:

  • adopts Multi-head Latent Attention (MLA)

 

posted @ 2025-02-05 10:23  lvmxh  阅读(26)  评论(0编辑  收藏  举报