摘要: With PEFT, we only train on small portion of parameters! What's using memory while training model? Trainable weights Optimizer states Gradients Forwar 阅读全文
posted @ 2024-03-14 11:04 MiraMira 阅读(61) 评论(0) 推荐(0)