04 2021 档案
强化学习note2——value iteration和policy iteration的区别,MC和TD的区别
摘要:value iteration和policy iteration的区别 value iteration: ①多次迭代Bellman最优等式和Bellman等式,等价值函数收敛后,②再用价值函数带入贝尔曼等式得到动作价值函数,策略就从最大的动作价值函数选取。(策略没有参与) policyiterati 阅读全文
posted @ 2021-04-29 11:14 A2he 阅读(616) 评论(0) 推荐(0)
强化学习note1——马尔科夫奖励过程MRP和马尔科夫决策过程MDP各个函数的定义与区别
摘要:马尔科夫奖励过程MRP 状态转移函数:\(P\left(S_{t+1}=s^{\prime} \mid s_{t}=s\right)\) 奖励函数:\(R\left(s_{t}=s\right)=\mathbb{E}\left[r_{t} \mid s_{t}=s\right]\) 回报:\(\ma 阅读全文
posted @ 2021-04-27 21:20 A2he 阅读(451) 评论(0) 推荐(0)
VS报错:0xC0000005: 写入位置0xCCCCCCCC时发生访问冲突
摘要:调用函数里没有写return 乌鸦....... 阅读全文
posted @ 2021-04-25 09:38 A2he 阅读(497) 评论(0) 推荐(0)
满射(onto) 和 单射(one-to-one)
摘要:满射 A mapping \(T: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}\) is said to be onto \(\mathbb{R}^{m}\) if each \(\mathbf{b}\) in \(\mathbb{R}^{m}\) is th 阅读全文
posted @ 2021-04-24 11:22 A2he 阅读(2207) 评论(0) 推荐(0)
linear transformation
摘要:Ax=b,Au=0 represent the transformation between x and b,0 define a note T as the linear transformation, we call above as the linear transformation,for 阅读全文
posted @ 2021-04-23 19:58 A2he 阅读(152) 评论(0) 推荐(0)
solution set and span
摘要:Homogeneous Linear System illustrated as just below , the solution set is Span{u,v} Nonhomogeneous System \(Ax=p\),supposed that v is the solution of 阅读全文
posted @ 2021-04-23 17:43 A2he 阅读(56) 评论(0) 推荐(0)
拉取代码,推送分支
摘要:git clone https:*** 查看分支:git branch 新建分支:git checkout -b Newbranch 推送分支:git push origin Newbranch 参考命令 阅读全文
posted @ 2021-04-23 15:26 A2he 阅读(56) 评论(0) 推荐(0)
MATLAB Error:错误使用 deal (line 37) 输入的数目应与输出的数目匹配。
摘要:matlabFunction:将符号表达式转化为函数句柄 当传入的参数为两个表达式,使用函数句柄的时候得到如下错误 错误使用 deal (line 37) 输入的数目应与输出的数目匹配。 错误代码如下: syms x y r = sqrt(x^2 + y^2); ht = matlabFunctio 阅读全文
posted @ 2021-04-22 12:03 A2he 阅读(755) 评论(0) 推荐(0)
A Geometric Description of Span
摘要:Span one vector to a line Let \(\mathbf{v}\) be a nonzero vector in \(\mathbb{R}^{3} .\) Then \(\operatorname{Span}\{\mathbf{v}\}\) is the set of all 阅读全文
posted @ 2021-04-21 22:08 A2he 阅读(131) 评论(0) 推荐(0)