强化学习算法 —— TRPO —— KL散度求费雪信息矩阵时的trick —— 用10%数据估算费雪信息矩阵FIM
相关:
https://digitalassets.lib.berkeley.edu/techreports/ucb/text/EECS-2016-217.pdf
Hence, a naïve implementation would spend more than 90% of the computational effort on these Fisher-vector
products. However, we can greatly reduce this burden by subsampling the data for the
computation of Fisher-vector product. Since the Fisher information matrix merely acts as
a metric, it can be computed on a subset of the data without severely degrading the quality of the final step. Hence, we can compute it on 10% of the data, and the total cost of
Hessian-vector products will be about the same as computing the gradient. With this optimization, the computation of a natural gradient step A-1g does not incur a significant
extra computational cost beyond computing the gradient g.

豆包AI给出解释:




本博客是博主个人学习时的一些记录,不保证是为原创,个别文章加入了转载的源地址,还有个别文章是汇总网上多份资料所成,在这之中也必有疏漏未加标注处,如有侵权请与博主联系。
如果未特殊标注则为原创,遵循 CC 4.0 BY-SA 版权协议。
posted on 2026-04-25 13:13 Angry_Panda 阅读(5) 评论(0) 收藏 举报
浙公网安备 33010602011771号