Deep RL Bootcamp Lecture 7: SVG, DDPG, and Stochastic Computation Graphs

 

 

 

 

 

 

 

 

 

 

 

 

 

^ is the square root of epsilon

 

 

 

 

 

 

 

 

 

 

a simplified version of hard version

a more smooth way to find correct solution

 

 

 

 

 

the first term is the REINFORCE term, and the seconde term is our grad log probability of our loss

 

 

 

 

 

b is a stochastic node 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

      

more formula derivations are ignored.

 

posted @ 2018-05-01 22:38  ecoflex  阅读(276)  评论(0编辑  收藏  举报