随笔档案「2018年7月」 - initial_h

《Playing hard exploration games by watching YouTube》论文解读

摘要：论文链接油管链接一、摘要当环境奖励特别稀疏的时候，强化学习方法通常很难训练(traditionally struggle)。一个有效的方式是通过人类示范者(human demonstrator)提供模仿轨迹(imitate trajectories)来指导强化学习的探索方向，通常的做法是观看人阅读全文

posted @ 2018-07-28 12:53 initial_h 阅读(1278) 评论(0) 推荐(0)

RuntimeWarning: invalid value encountered in true_divide

摘要：这个问题可能是在使用numpy的时候出现了0除以0造成的。比如: 这里0/0的报错不具体，有时候不容易发现。如果是1/0这种，会有更加具体的错误信息。比如：阅读全文

posted @ 2018-07-25 16:17 initial_h 阅读(29715) 评论(0) 推荐(2)

MDP中值函数的求解

摘要：MDP概述马尔科夫决策过程(Markov Decision Process)是强化学习(reinforcement learning)最基本的模型框架。它对序列化的决策过程做了很多限制。比如状态$S_t$和动作$a_t$只有有限个、$(S_t,a_t)$对应的回报$R_t$ 阅读全文

posted @ 2018-07-17 10:52 initial_h 阅读(4964) 评论(0) 推荐(1)

1. Two Sum (Python)

摘要："1. Two Sum" Description Given an array of integers, return indices of the two numbers such that they add up to a specific target.You may assume that 阅读全文

posted @ 2018-07-07 20:15 initial_h 阅读(956) 评论(0) 推荐(0)

initial_h

https://github.com/initial-h

07 2018 档案

公告