Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Farama Foundation Gymnasium Generalized Advantage Estimation

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Policy_Gradient
Last Updated 2026-02-15 03:00 GMT

Overview

A variance reduction technique for policy gradient methods that computes advantage estimates using an exponentially weighted sum of temporal-difference residuals.

Description

Generalized Advantage Estimation (GAE) addresses the bias-variance tradeoff in advantage estimation for policy gradient algorithms. Raw Monte Carlo returns have high variance, while single-step TD errors have high bias. GAE interpolates between these extremes using a parameter λ[0,1]:

  • λ=0: Single-step TD (low variance, high bias)
  • λ=1: Monte Carlo returns (high variance, low bias)

GAE is the standard advantage estimator in PPO, A2C, and other actor-critic methods. It provides smooth control over the bias-variance tradeoff, typically with λ=0.95 as a widely-used default.

Usage

Use GAE when implementing actor-critic algorithms that require advantage estimates. It is computed after collecting a batch of trajectories from vectorized environments and requires a learned value function for bootstrapping.

Theoretical Basis

The TD residual at time t: δt=rt+γV(st+1)V(st)

The GAE advantage estimate: A^tGAE(γ,λ)=l=0Tt1(γλ)lδt+l

This can be computed efficiently via backward recursion: A^T=0 A^t=δt+γλA^t+1

At episode boundaries (terminated=True): V(st+1)=0 At truncation (truncated=True): V(st+1) is used for bootstrapping.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment