RL——METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/
近期,阅读了一篇发表在ICLR2024上的文章《METRA: Scalable Unsupervised RL with Metric-Aware Abstraction》,本博文从生成对抗网络的变种Wasserstein GAN的角度出发,来理解METRA的提出动机。首先介绍一些基础知识,包括:KL散度、JS散度、Wasserstein距离、Lipschitz条件、信息熵、联合熵、条件熵、前向与反向互信息、相对熵、Jensen不等式。然后通过Wasserstein GAN与METRA左右对比来理解METRA的由来。进一步,详细解读了METRA方法的公式推导过程、算法流程、直观理解以及与DIAYN、DADS和CIC方法之间的联系。










参考资料:
[1] Park S, Rybkin O, Levine S. METRA: Scalable Unsupervised RL with Metric-Aware Abstraction. In International Conference on Learning Representations (ICLR), 2024.
[2] 平均场理论:凯鲁嘎吉 - https://www.cnblogs.com/kailugaji/p/10692797.html、https://www.cnblogs.com/kailugaji/p/12463966.html
[3] 生成对抗网络(GAN与W-GAN):凯鲁嘎吉 - https://www.cnblogs.com/kailugaji/p/15352841.html
[4] 非对称度量即拟度量的定义:凯鲁嘎吉 - https://www.cnblogs.com/kailugaji/p/19210601
[5] DIAYN:Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations (ICLR), 2019.
[6] DADS:Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. Dynamics aware unsupervised discovery of skills. In International Conference on Learning Representations (ICLR), 2020.
[7] CIC:Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, and P. Abbeel. Unsupervised reinforcement learning with contrastive intrinsic control. In Neural Information Processing Systems (NeurIPS), 2022.
浙公网安备 33010602011771号