摘要:
目录概1-bit Adam1-bit SGD代码 Seide F., Fu H., Droppo J., Li G. and Yu D. 1-bit stochastic gradient descent and its application to data-parallel distribute 阅读全文
posted @ 2025-02-13 21:28
馒头and花卷
阅读(31)
评论(0)
推荐(0)
摘要:
目录概MotivationZeROZeRO-OffloadZeRO-InfiniteZeRO++代码 Rajbhandari S., Rasley J., Ruwase O. and He Y. ZeRO: Memory optimizations toward training trillion 阅读全文
posted @ 2025-02-13 14:33
馒头and花卷
阅读(359)
评论(0)
推荐(0)

浙公网安备 33010602011771号