摘要: 目录概1-bit Adam1-bit SGD代码 Seide F., Fu H., Droppo J., Li G. and Yu D. 1-bit stochastic gradient descent and its application to data-parallel distribute 阅读全文
posted @ 2025-02-13 21:28 馒头and花卷 阅读(31) 评论(0) 推荐(0)
摘要: 目录概MotivationZeROZeRO-OffloadZeRO-InfiniteZeRO++代码 Rajbhandari S., Rasley J., Ruwase O. and He Y. ZeRO: Memory optimizations toward training trillion 阅读全文
posted @ 2025-02-13 14:33 馒头and花卷 阅读(359) 评论(0) 推荐(0)