Loss is its own Reward: Self-Supervision for Reinforcement Learning

作者用action, reward, state等当做lalbel,进行有监督训练。

 

posted @ 2018-03-12 17:37  Shiyu_Huang  阅读(618)  评论(0编辑  收藏  举报