摘要: 1.Markov decision processes formally describe an environment for reinforcement learning Where the environment is fully observable The current state co 阅读全文
posted @ 2018-11-20 17:01 TaeYoon 阅读(233) 评论(0) 推荐(0)
摘要: 1.The difference of the reinforcement learning:(区别于传统的监督/非监督学习) no supervisor ,only a reward signal(小孩试错的过程) feedback is delayed,not instantaneous(错误的 阅读全文
posted @ 2018-11-20 16:59 TaeYoon 阅读(321) 评论(0) 推荐(0)