Offline RL | Deadly Triad issue

推荐阅读Why is there a Deadly Triad issue and how to handle it ?

Bootstrapping
Off-policy learning
Function approximations
当上述三者结合在一起时，value function 可能表示不稳定或者过于多样

why:

function approximation might implicitly embed Q(S, a) ≈ Q(S’, a1)

posted @ 2025-02-13 18:44 霜尘FrostDust 阅读(65) 评论(0) 收藏举报

刷新页面返回顶部