摘要:
## TD learning of state values The data/experience required by the algorithm: - $\left(s_0, r_1, s_1, \ldots, s_t, r_{t+1}, s_{t+1}, \ldots\right)$ or 阅读全文
posted @ 2023-08-13 16:47
鸽鸽的书房
阅读(28)
评论(0)
推荐(0)
摘要:
### the discounted return $$ \begin{aligned} G_t & =R_{t+1}+\gamma R_{t+2}+\gamma^2 R_{t+3}+\ldots \\ & =R_{t+1}+\gamma\left(R_{t+2}+\gamma R_{t+3}+\l 阅读全文
posted @ 2023-08-13 16:05
鸽鸽的书房
阅读(26)
评论(0)
推荐(0)
摘要:
# 1.7 Markov decision processes This section presents these concepts in a more formal way under the framework of Markov decision processes (MDPs). An 阅读全文
posted @ 2023-08-13 15:30
鸽鸽的书房
阅读(18)
评论(0)
推荐(0)