Offline RL | Deadly Triad issue

推荐阅读Why is there a Deadly Triad issue and how to handle it ?

  • Bootstrapping
  • Off-policy learning
  • Function approximations
    当上述三者结合在一起时,value function 可能表示不稳定或者过于多样

why:

function approximation might implicitly embed Q(S, a) ≈ Q(S’, a1)

posted @ 2025-02-13 18:44  霜尘FrostDust  阅读(38)  评论(0)    收藏  举报