Deepseek的RL算法GRPO解读

https://mp.weixin.qq.com/s?__biz=MzI4MDYzNzg4Mw==&mid=2247569612&idx=3&sn=a3e0e3fd74c391da56fe39d2ac5ba62c&chksm=eab415c5169c91180f00186aa3bfd96a94950f17c03181a279828a0f097a44f874287699b45e&scene=27

posted on 2025-06-19 15:30  张博的博客  阅读(18)  评论(0)    收藏  举报

导航