GAE: Effect of the lambda Parameter on the Bias-Variance Tradeoff
hardmcq
General
In policy-gradient methods such as PPO, the advantage is commonly estimated with Generalized Advantage Estimation (GAE):
$$
where is the learned value (critic) baseline. Consider the role of the parameter (with fixed). Which of the following statements is correct?