复习一下PPO吧

https://zhuanlan.zhihu.com/p/614115887

Proximal Policy Optimization (PPO) 算法理解:从策略梯度开始

posted @ 2026-01-21 16:44  blcblc  阅读(5)  评论(0)    收藏  举报