摘要:
transition possibility is unknown and we even don't need to estimate the possibility 阅读全文
posted @ 2018-05-26 23:04
ecoflex
阅读(146)
评论(0)
推荐(0)
摘要:
understand that correlated samples cause problem. and how paralled solve the problem another solution is replay buffers, fully ultilizing the advantag 阅读全文
posted @ 2018-05-26 19:57
ecoflex
阅读(220)
评论(0)
推荐(0)
摘要:
in most AC algorithms, we actually just fit value function. less common to fit Q function as well. batch:off line, monte carlo。online: bootstrap,TD in 阅读全文
posted @ 2018-05-26 12:28
ecoflex
阅读(213)
评论(0)
推荐(0)

浙公网安备 33010602011771号