DeepSeek随笔
V3论文: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
R1论文: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
V3 解读readout:
V3为了提高效率降低成本:
- adopts Multi-head Latent Attention (MLA)
V3论文: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
R1论文: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
V3 解读readout:
V3为了提高效率降低成本: