摘要:
in policy gradient, "a" is replaced by "u" usually. use this new form to estimate how good the update is. If all three path show positive reward, shou 阅读全文
posted @ 2018-04-30 20:37
ecoflex
阅读(243)
评论(0)
推荐(0)
摘要:
https://www.youtube.com/watch?v=fevMOp5TDQs http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html artari is not a MDP, but MDP method wo 阅读全文
posted @ 2018-04-30 16:41
ecoflex
阅读(203)
评论(0)
推荐(0)
摘要:
... 阅读全文
posted @ 2018-04-30 16:03
ecoflex
阅读(133)
评论(0)
推荐(0)
摘要:
https://www.youtube.com/watch?v=qaMdN6LS9rA https://drive.google.com/file/d/0BxXI_RttTZAhVXBlMUVkQ1BVVDQ/view match: a4 b1 c2 d3 a The middle one is c 阅读全文
posted @ 2018-04-30 14:52
ecoflex
阅读(221)
评论(0)
推荐(0)

浙公网安备 33010602011771号