离线强化学习(A Survey on Offline Reinforcement Learning)

离线强化学习(A Survey on Offline Reinforcement Learning)

作者：凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/

通过阅读《A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems》与《Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems》这两篇关于离线强化学习的综述论文，初步认识离线强化学习，了解离线强化学习的概念、挑战、相关方法(仅粗略介绍，未详细展开)及未来可能的研究方向。更多强化学习内容，请看：随笔分类 - Reinforcement Learning。

1. Introduction

1.1 Supervised Machine Learning, RL, and Off-policy RL

1.2 The Power of Offline RL

1.3 On-policy vs. Off-policy

1.4 On-policy, Off-policy, and Offline (Batch) RL

1.5 Imitation Learning, RL, and Offline RL

2. Challenges

3. Taxonomy

Illustration of the general structure of an offline RL algorithm

3.1 Policy Constraints

3.2 Importance Sampling

3.3 Regularization

3.4 Uncertainty Estimation

3.5 Model-based Methods

3.6 One-step Methods

3.7 Imitation Learning

模仿学习资料：

许天，李子牛，俞扬，模仿学习简洁教程，2021. http://www.lamda.nju.edu.cn/xut/Imitation_Learning.pdf

【RLChina 2021】第10课强化学习前沿（二）俞扬：https://www.bilibili.com/video/BV1qM4y1L7w9?spm_id_from=333.999.0.0

3.8 Trajectory Optimization

4. Open Problems

5. 参考文献

[1] Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo and Esther Luna Colombini. “A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems”(2022).

[2] Sergey Levine, Aviral Kumar, George Tucker and Justin Fu. “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”(2020).

[3] CS 285 Deep Reinforcement Learning https://rail.eecs.berkeley.edu/deeprlcourse/

[4] CS330 Fall 2021 Deep Multi-Task and Meta Learning https://cs330.stanford.edu/

[5] Offline (Batch) Reinforcement Learning: A Review of Literature and Applications https://danieltakeshi.github.io/2020/06/28/offline-rl/

[6] RL-Paper-notes https://github.com/2019ChenGong/RL-Paper-notes

[7] An Optimistic Perspective on Offline Reinforcement Learning https://offline-rl.github.io/

[8] 离线强化学习基准：https://github.com/rail-berkeley/d4rl

[9] 【RLChina 2021】第9课强化学习前沿（一）卢宗青：https://www.bilibili.com/video/BV1cQ4y1m7Nn?spm_id_from=333.999.0.0

[10] Offline Reinforcement Learning Resources, https://offlinerl.ai/

posted on 2022-03-22 17:18 凯鲁嘎吉阅读(3833) 评论(5) 收藏举报

刷新页面返回顶部