Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field

郑重声明：原文参见标题，如有侵权，请联系作者，将会撤销发布！

https://arxiv.org/abs/1908.04683

Abstract

　　深度强化学习（DRL）的一致性和可重复性评估并不简单。在街机学习环境（ALE）中，环境参数（如随机性或最大允许游戏时间）的微小变化会导致非常不同的性能。在这项工作中，我们讨论了比较接受ALE性能的不同智能体的困难。为了进一步朝着可复现和可比较的DRL迈进，我们引入了SABER，一种用于通用强化学习算法的标准化Atari基准。我们的方法扩展了先前的建议，包含一整套环境参数以及训练和测试程序。然后，我们使用SABER评估当前最优算法，Rainbow。此外，我们引入了一个人类世界记录基准，并认为先前关于DRL专家或超人表现的说法可能不准确。最后，我们通过使用隐式分位数网络（IQN）扩展Rainbow，提出Rainbow IQN，从而实现新的最优性能。源代码可用于复现。

1 Introduction

1.1 Related Work

2 Challenges when Comparing Performance on the Atari Benchmark

2.1 Revisiting ALE: an Initial Step towards Standardization

2.2 Maximum Episode Length

Glitch and bug in the ALE environment

2.3 HumanWorld Records Baseline

3 SABER : a Standardized Atari BEnchmark for Reinforcement learning

3.1 Training and Evaluation Procedures

3.2 Reporting Results

4 Rainbow-IQN

5 Experiments

5.1 Rainbow Evaluation

5.2 Rainbow-IQN: Evaluation and Comparison

Influence of maximum episode length

Comparison to Rainbow

Comparison to DQN

5.3 Stability of both Rainbow and Rainbow-IQN

6 Conclusion: why is RL that Bad at Atari Games?

posted on 2023-01-14 19:48 穷酸秀才大草包阅读(62) 评论(0) 收藏举报

刷新页面返回顶部

穷酸秀才大艹包

Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field

导航

公告