XXooptRobotics

GAE: Effect of the lambda Parameter on the Bias-Variance Tradeoff

hardmcq

General

In policy-gradient methods such as PPO, the advantage is commonly estimated with Generalized Advantage Estimation (GAE):

$A^tGAE(γ,λ)=l=0(γλ)lδt+l,δt=rt+γV(st+1)V(st)\hat{A}_t^{\text{GAE}(\gamma,\lambda)} = \sum_{l=0}^{\infty} (\gamma\lambda)^l \, \delta_{t+l}, \qquad \delta_t = r_t + \gamma V(s_{t+1}) - V(s_t)$

where VV is the learned value (critic) baseline. Consider the role of the parameter λ[0,1]\lambda \in [0,1] (with γ\gamma fixed). Which of the following statements is correct?