强化学习:连续控制问题中Actor-Critic算法的linear baseline
最近在看连续控制问题,看到了一个Actor-Critic算法中手动扩展features和设置linear baseline的方法,这些方法源自论文:《Benchmarking Deep Reinforcement Learning for Continuous Control》。
对于低维的features我们可以手动扩展:
代码实现:
return torch.cat([observations, observations ** 2, al, al ** 2, al ** 3, ones], dim=2)
-----------------------------------------------------
linear baseline,在AC算法中给Critic降低方差之用,给出一种简单的线性拟合方式,使用最小二乘法拟合:
代码:
def fit(self, episodes): # sequence_length * batch_size x feature_size featmat = self._feature(episodes).view(-1, self.feature_size) # sequence_length * batch_size x 1 returns = episodes.returns.view(-1, 1) reg_coeff = self._reg_coeff eye = torch.eye(self.feature_size, dtype=torch.float32, device=self.linear.weight.device) for _ in range(5): try: coeffs = torch.linalg.lstsq( torch.matmul(featmat.t(), featmat) + reg_coeff * eye, torch.matmul(featmat.t(), returns) ).solution break except RuntimeError: reg_coeff += 10 else: raise RuntimeError('Unable to solve the normal equations in ' '`LinearFeatureBaseline`. The matrix X^T*X (with X the design ' 'matrix) is not full-rank, regardless of the regularization ' '(maximum regularization: {0}).'.format(reg_coeff)) self.linear.weight.data = coeffs.data.t()
===============================================
详细代码地址:
https://gitee.com/devilmaycry812839668/MAML-Pytorch-RL/blob/master/maml_rl/baseline.py
本博客是博主个人学习时的一些记录,不保证是为原创,个别文章加入了转载的源地址,还有个别文章是汇总网上多份资料所成,在这之中也必有疏漏未加标注处,如有侵权请与博主联系。
如果未特殊标注则为原创,遵循 CC 4.0 BY-SA 版权协议。
posted on 2023-05-31 00:22 Angry_Panda 阅读(68) 评论(0) 编辑 收藏 举报