zzPPO原理与源码解读

 

https://zhuanlan.zhihu.com/p/677607581

图解大模型RLHF系列之:人人都能看懂的PPO原理与源码解读

 
 

https://github.com/deepspeedai/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/README.md

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

 

https://zhuanlan.zhihu.com/p/658047809

RLHF训练 DeepSpeed Chat实践(保姆式教程)

 

posted @ 2025-12-27 12:26  blcblc  阅读(0)  评论(0)    收藏  举报