Principle:Farama Foundation Gymnasium Episode Statistics Tracking
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Monitoring |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
A monitoring pattern that transparently tracks cumulative rewards, episode lengths, and timing information across episodes during RL training and evaluation.
Description
Episode Statistics Tracking is a cross-cutting concern in reinforcement learning that accumulates per-episode metrics without modifying the core environment logic. Using the wrapper/decorator pattern, statistics tracking intercepts the step and reset methods to compute:
- Cumulative reward (sum of rewards within an episode)
- Episode length (number of steps)
- Episode time (wall-clock duration)
These metrics are stored in fixed-size buffers (deques) and injected into the info dictionary at episode boundaries (when terminated or truncated is True). This enables monitoring training progress, computing rolling averages, and triggering early stopping without coupling logging logic to the agent or environment.
Usage
Use this principle whenever training or evaluating an RL agent and you need to track performance metrics over episodes. It is essential for plotting learning curves, computing average returns, and monitoring convergence.
Theoretical Basis
The tracking follows accumulation at each step with flush at episode boundaries:
# Abstract algorithm
on_reset():
episode_return = 0
episode_length = 0
start_time = now()
on_step(reward, done):
episode_return += reward
episode_length += 1
if done:
return_queue.append(episode_return)
length_queue.append(episode_length)
time_queue.append(now() - start_time)
Using bounded deques ensures constant memory usage regardless of training duration.