摘要: 目录Megatron: Reducing Activation Recomputation in Large Transformer ModelsTL;DRMethodSP(Sequence Parallelism、序列并行)Selective RecomputationCode && Implem 阅读全文
posted @ 2025-06-03 21:24 fariver 阅读(59) 评论(0) 推荐(0)