随笔分类 - Mixture of Experts

同样 70B 参数，为什么 MoE 只激活 13B 就能打平 Dense？

摘要：如果你关注大模型的发展，一定注意到一个趋势：排名靠前的开源模型越来越多采用 MoE（Mixture of Experts）架构。DeepSeek-V4 有 1.6T 总参数但只激活 49B，Qwen3 也有 MoE 版本，Mixtral 更是靠 MoE 架构一战成名。与此同时，Llama 4、Qw 阅读全文

posted @ 2026-04-24 19:12 iTech 阅读(128) 评论(0) 推荐(0)

iTech's Blog

AI人工智能时代 www.theaiera.cn

随笔分类 - Mixture of Experts

公告