Principle:Farama Foundation Gymnasium Episode Time Limiting
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Episode_Management |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
A mechanism that enforces maximum episode duration by truncating episodes that exceed a step limit, distinct from MDP-defined termination.
Description
Episode Time Limiting imposes an external constraint on episode length independent of the environment's internal dynamics. After a specified number of steps, the episode is truncated (not terminated), signaling to the agent that the episode ended due to time, not because a terminal MDP state was reached.
This distinction is critical for RL algorithms:
- Terminated: The value of the terminal state is 0 (no future rewards possible)
- Truncated: The value of the state may be non-zero (the agent was artificially stopped)
Correct handling prevents bias in value function estimation. Time limits are typically set via max_episode_steps during environment registration.
Usage
Use time limiting for environments where episodes could run indefinitely without intervention. Most registered Gymnasium environments include a default max_episode_steps that is automatically applied by gymnasium.make().
Theoretical Basis
Time truncation at step :
For correct bootstrapping in temporal-difference learning: