Representation and General Value Functions——General Value Functions(GVFs)
https://sites.ualberta.ca/~pilarski/docs/theses/Sherstan_Craig_D_202009_PhD.pd 原文链接
General value functions (GVFs) make two relaxations to the value function definition we have already considered (Sutton, Modayil, et al., 2011). First, we are free to choose any signal available to the agent as the prediction target, not just reward. We refer to the prediction target as the cumulant(n. [数] 累积量,累积数), C. Secondly, the discount parameter, γ, is replaced by a transition dependent continuation function: γt+1 ≡ γ(St,At,St+1) (White, 2017) (Note that given this definition γ need not lie in [0,1], and can even be complex valued (De Asis et al., 2018)). This function is referred to by several names in the literature including the continuation function, discount and timescale. With these two generalizations we define the return as
Like a value function, a GVF is defined by three components: the policy, the timescale, and the prediction target. GVFs allow the agent to express representation elements in the form of predictive questions. Consider the following examples for a mobile robot: