Principle:Farama Foundation Gymnasium Generalized Advantage Estimation
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Policy_Gradient |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
A variance reduction technique for policy gradient methods that computes advantage estimates using an exponentially weighted sum of temporal-difference residuals.
Description
Generalized Advantage Estimation (GAE) addresses the bias-variance tradeoff in advantage estimation for policy gradient algorithms. Raw Monte Carlo returns have high variance, while single-step TD errors have high bias. GAE interpolates between these extremes using a parameter :
- : Single-step TD (low variance, high bias)
- : Monte Carlo returns (high variance, low bias)
GAE is the standard advantage estimator in PPO, A2C, and other actor-critic methods. It provides smooth control over the bias-variance tradeoff, typically with as a widely-used default.
Usage
Use GAE when implementing actor-critic algorithms that require advantage estimates. It is computed after collecting a batch of trajectories from vectorized environments and requires a learned value function for bootstrapping.
Theoretical Basis
The TD residual at time :
The GAE advantage estimate:
This can be computed efficiently via backward recursion:
At episode boundaries (terminated=True): At truncation (truncated=True): is used for bootstrapping.