凯鲁嘎吉
用书写铭记日常,最迷人的不在远方

RL——METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/

        近期,阅读了一篇发表在ICLR2024上的文章《METRA: Scalable Unsupervised RL with Metric-Aware Abstraction》,本博文从生成对抗网络的变种Wasserstein GAN的角度出发,来理解METRA的提出动机。首先介绍一些基础知识,包括:KL散度、JS散度、Wasserstein距离、Lipschitz条件、信息熵、联合熵、条件熵、前向与反向互信息、相对熵、Jensen不等式。然后通过Wasserstein GAN与METRA左右对比来理解METRA的由来。进一步,详细解读了METRA方法的公式推导过程、算法流程、直观理解以及与DIAYNDADSCIC方法之间的联系。

幻灯片1

幻灯片2

幻灯片3

幻灯片4

幻灯片5

幻灯片6

幻灯片7

幻灯片8

幻灯片9

幻灯片10

参考资料:

[1]      Park S, Rybkin O, Levine S. METRA: Scalable Unsupervised RL with Metric-Aware Abstraction. In International Conference on Learning Representations (ICLR), 2024.

[2]      平均场理论:凯鲁嘎吉 - https://www.cnblogs.com/kailugaji/p/10692797.html、https://www.cnblogs.com/kailugaji/p/12463966.html

[3]      生成对抗网络(GAN与W-GAN):凯鲁嘎吉 - https://www.cnblogs.com/kailugaji/p/15352841.html

[4]      非对称度量即拟度量的定义:凯鲁嘎吉 - https://www.cnblogs.com/kailugaji/p/19210601

[5]      DIAYN:Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations (ICLR), 2019.

[6]      DADS:Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. Dynamics aware unsupervised discovery of skills. In International Conference on Learning Representations (ICLR), 2020.

[7]      CIC:Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, and P. Abbeel. Unsupervised reinforcement learning with contrastive intrinsic control. In Neural Information Processing Systems (NeurIPS), 2022.

posted on 2025-12-03 21:31  凯鲁嘎吉  阅读(0)  评论(0)    收藏  举报