Principle:Farama Foundation Gymnasium Episode Time Limiting

Knowledge Sources	Gymnasium Wrappers Sutton and Barto RL
Domains	Reinforcement_Learning, Episode_Management
Last Updated	2026-02-15 03:00 GMT

Overview

A mechanism that enforces maximum episode duration by truncating episodes that exceed a step limit, distinct from MDP-defined termination.

Description

Episode Time Limiting imposes an external constraint on episode length independent of the environment's internal dynamics. After a specified number of steps, the episode is truncated (not terminated), signaling to the agent that the episode ended due to time, not because a terminal MDP state was reached.

This distinction is critical for RL algorithms:

Terminated: The value of the terminal state is 0 (no future rewards possible)
Truncated: The value of the state may be non-zero (the agent was artificially stopped)

Correct handling prevents bias in value function estimation. Time limits are typically set via max_episode_steps during environment registration.

Usage

Use time limiting for environments where episodes could run indefinitely without intervention. Most registered Gymnasium environments include a default max_episode_steps that is automatically applied by gymnasium.make().

Theoretical Basis

Time truncation at step $T_{m a x}$ :

${truncated}_{t} = {\begin{cases} True & if t \geq T_{m a x} \\ False & otherwise \end{cases}$

For correct bootstrapping in temporal-difference learning:

$V (s_{t}) \leftarrow {\begin{cases} 0 & if terminated \\ V (s_{t + 1}) & if truncated (bootstrap) \end{cases}$

Related Pages

Implemented By

Implementation:Farama_Foundation_Gymnasium_TimeLimit_Wrapper

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment