摘要: 发表时间:2016(ICLR 2016) 文章要点:这篇文章提出了很经典的experience replay的方法PER,通过temporal-difference (TD) error来给采样赋权重(Sequences associated with rewards appear to be re 阅读全文
posted @ 2024-02-14 08:29 initial_h 阅读(103) 评论(0) 推荐(0)