Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Farama Foundation Gymnasium Episode Statistics Tracking

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Monitoring
Last Updated 2026-02-15 03:00 GMT

Overview

A monitoring pattern that transparently tracks cumulative rewards, episode lengths, and timing information across episodes during RL training and evaluation.

Description

Episode Statistics Tracking is a cross-cutting concern in reinforcement learning that accumulates per-episode metrics without modifying the core environment logic. Using the wrapper/decorator pattern, statistics tracking intercepts the step and reset methods to compute:

  • Cumulative reward (sum of rewards within an episode)
  • Episode length (number of steps)
  • Episode time (wall-clock duration)

These metrics are stored in fixed-size buffers (deques) and injected into the info dictionary at episode boundaries (when terminated or truncated is True). This enables monitoring training progress, computing rolling averages, and triggering early stopping without coupling logging logic to the agent or environment.

Usage

Use this principle whenever training or evaluating an RL agent and you need to track performance metrics over episodes. It is essential for plotting learning curves, computing average returns, and monitoring convergence.

Theoretical Basis

The tracking follows accumulation at each step with flush at episode boundaries:

# Abstract algorithm
on_reset():
    episode_return = 0
    episode_length = 0
    start_time = now()

on_step(reward, done):
    episode_return += reward
    episode_length += 1
    if done:
        return_queue.append(episode_return)
        length_queue.append(episode_length)
        time_queue.append(now() - start_time)

Using bounded deques ensures constant memory usage regardless of training duration.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment