摘要:
1.The simplest Actor-Critic(QAC) 2. Advantage actor-critic(A2C) 3. Off-policy actor-critic 4. Deterministic actor-critic(DPG) 阅读全文
posted @ 2025-04-10 16:56
penuel
阅读(33)
评论(0)
推荐(0)
摘要:
1. Basic idea of policy gradient 之前的策略都是用表格表示的,现在改成函数的形式描述策略 2. Metric 1 - Average value 3. Metric 2 - Average reward 4. Gradients of the metrics 5. G 阅读全文
posted @ 2025-04-10 11:08
penuel
阅读(30)
评论(0)
推荐(0)

浙公网安备 33010602011771号