摘要: 概率密度函数 期望(expect) state s action a agent policy Π(a|s) reward r state transition p(s'|s,a) return(cumulative future reward 未来累计回报) discounted return(γ 阅读全文
posted @ 2023-05-09 17:26 阿Qi早起了吗 阅读(87) 评论(0) 推荐(0)