摘要: 1.The simplest Actor-Critic(QAC) 2. Advantage actor-critic(A2C) 3. Off-policy actor-critic 4. Deterministic actor-critic(DPG) 阅读全文
posted @ 2025-04-10 16:56 penuel 阅读(33) 评论(0) 推荐(0)
摘要: 1. Basic idea of policy gradient 之前的策略都是用表格表示的,现在改成函数的形式描述策略 2. Metric 1 - Average value 3. Metric 2 - Average reward 4. Gradients of the metrics 5. G 阅读全文
posted @ 2025-04-10 11:08 penuel 阅读(30) 评论(0) 推荐(0)