Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium Play

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Interactive_Visualization
Last Updated 2026-02-15 03:00 GMT

Overview

Provides interactive human play capabilities for Gymnasium environments through keyboard input using PyGame, with optional live metric plotting via matplotlib.

Description

The play module enables humans to interact with Gymnasium environments through keyboard controls rendered via PyGame. It contains three main components:

PlayableGame is a class that wraps an environment and manages a PyGame window, tracking pressed keys and translating them into environment actions. It requires the environment to use rgb_array or rgb_array_list render mode and supports optional zoom on the rendered output. It processes PyGame events including key presses, key releases, window resize, and quit events.

play() is the main function that runs the interactive game loop. It accepts a keys_to_action dictionary mapping keyboard combinations to actions, supports multiple key encoding formats (tuples of ints, tuples of characters, or strings), and runs at either the environment's declared render_fps or a user-specified FPS. A callback function can be provided to execute after every step. The wait_on_player parameter enables turn-based interaction where the environment only steps when a key is pressed.

PlayPlot is a helper class that provides real-time matplotlib scatter plots of arbitrary metrics during play. It takes a callback function that computes metrics from each environment transition and displays them in live updating plots with a configurable time horizon.

Usage

Use the play() function to manually test and debug environments by interacting with them through the keyboard. Use PlayPlot to visualize reward curves or other metrics in real time during play sessions.

Code Reference

Source Location

Signature

def play(
    env: Env,
    transpose: bool | None = True,
    fps: int | None = None,
    zoom: float | None = None,
    callback: Callable | None = None,
    keys_to_action: dict[tuple[str | int, ...] | str | int, ActType] | None = None,
    seed: int | None = None,
    noop: ActType = 0,
    wait_on_player: bool = False,
) -> None

class PlayableGame:
    def __init__(self, env: Env, keys_to_action: dict | None = None, zoom: float | None = None)
    def process_event(self, event: Event)

class PlayPlot:
    def __init__(self, callback: Callable, horizon_timesteps: int, plot_names: list[str])
    def callback(self, obs_t, obs_tp1, action, rew, terminated, truncated, info)

Import

from gymnasium.utils.play import play, PlayPlot, PlayableGame

I/O Contract

Inputs

Name Type Required Description
env gymnasium.Env Yes The environment to play (must use rgb_array render mode)
transpose bool or None No Whether to transpose the observation display (default True)
fps int or None No Maximum steps per second (default from env metadata or 30)
zoom float or None No Zoom factor for the rendered output
callback Callable or None No Function called after each step with transition data
keys_to_action dict or None No Mapping from key combinations to actions
seed int or None No Random seed for env.reset()
noop ActType No Default action when no key is pressed (default 0)
wait_on_player bool No If True, wait for player input before stepping (default False)

Outputs

Name Type Description
(none) None The play() function runs the interactive loop until the window is closed

Usage Examples

import gymnasium as gym
from gymnasium.utils.play import play

# Basic usage with default key mapping
env = gym.make("ALE/Pong-v5", render_mode="rgb_array")
play(env, zoom=3)

# Custom key-to-action mapping
import numpy as np
play(
    gym.make("CarRacing-v3", render_mode="rgb_array"),
    keys_to_action={
        "w": np.array([0, 0.7, 0], dtype=np.float32),
        "a": np.array([-1, 0, 0], dtype=np.float32),
        "s": np.array([0, 0, 1], dtype=np.float32),
        "d": np.array([1, 0, 0], dtype=np.float32),
    },
    noop=np.array([0, 0, 0], dtype=np.float32),
)

# With live reward plotting
from gymnasium.utils.play import PlayPlot
def callback(obs_t, obs_tp1, action, rew, terminated, truncated, info):
    return [rew]
plotter = PlayPlot(callback, 150, ["reward"])
play(gym.make("CartPole-v1", render_mode="rgb_array"), callback=plotter.callback)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment