2022 年 6月 6 日随笔档案 - initial_h

2022年6月6日

摘要： **发表时间：**2016（NIPS 2016） **文章要点：**这篇文章提出了Bootstrapped DQN算法来做深度探索。作者认为，当前的探索策略比如ϵ-greedy，并没有进行深度探索（temporally-extended (or deep) exploration）。Deep exp 阅读全文

posted @ 2022-06-06 23:46 initial_h 阅读(271) 评论(0) 推荐(1) 编辑

Policy Distillation

摘要： **发表时间：**2016（ICLR 2016） **文章要点：**这篇文章考虑的情形是从一个RL的policy网络提取策略，迁移到另一个policy网络。其实就是知识迁移（Distillation is a method to transfer knowledge from a teacher m 阅读全文

posted @ 2022-06-06 23:44 initial_h 阅读(96) 评论(0) 推荐(0) 编辑

initial_h

https://github.com/initial-h

公告