Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Farama Foundation Gymnasium Env Subclass Interface

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Environment_Design
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete interface for building custom RL environments by subclassing gymnasium.Env provided by the Gymnasium library.

Description

The gymnasium.Env base class is a generic abstract class parameterized by ObsType and ActType. Custom environments must subclass it and implement __init__, reset, step, and optionally render and close. The class provides np_random (seeded PRNG), spec (EnvSpec from make()), and wrapper attribute traversal methods.

Usage

Subclass gymnasium.Env when creating any custom environment. Set observation_space and action_space in __init__, call super().reset(seed=seed) at the start of reset, and return the correct tuple types from step and reset.

Code Reference

Source Location

  • Repository: Gymnasium
  • File: gymnasium/core.py
  • Lines: L22-281

Signature

class Env(Generic[ObsType, ActType]):
    """The main Gymnasium class for implementing RL environments."""

    # Set in ALL subclasses
    action_space: spaces.Space[ActType]
    observation_space: spaces.Space[ObsType]

    # Set in SOME subclasses
    metadata: dict[str, Any] = {"render_modes": []}
    render_mode: str | None = None

    def step(self, action: ActType) -> tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]:
        """Run one timestep of the environment's dynamics."""
        raise NotImplementedError

    def reset(self, *, seed: int | None = None, options: dict[str, Any] | None = None) -> tuple[ObsType, dict[str, Any]]:
        """Reset the environment to an initial state."""
        if seed is not None:
            self._np_random, self._np_random_seed = seeding.np_random(seed)

    def render(self) -> RenderFrame | list[RenderFrame] | None:
        """Compute render frames as specified by render_mode."""
        raise NotImplementedError

    def close(self) -> None:
        """Clean up resources (rendering windows, connections)."""
        pass

Import

import gymnasium as gym
from gymnasium import spaces
import numpy as np

class MyEnv(gym.Env):
    ...

I/O Contract

Inputs

Name Type Required Description
observation_space spaces.Space Yes Must be set in __init__
action_space spaces.Space Yes Must be set in __init__
metadata dict Yes Must include "render_modes" key
render_mode str or None No Rendering mode from metadata

Outputs

Name Type Description
step() tuple[ObsType, float, bool, bool, dict] (obs, reward, terminated, truncated, info)
reset() tuple[ObsType, dict] (initial_obs, info)
render() ndarray or None Render frame or None

Usage Examples

Grid World Environment

import gymnasium as gym
from gymnasium import spaces
import numpy as np

class GridWorldEnv(gym.Env):
    metadata = {"render_modes": ["rgb_array"], "render_fps": 4}

    def __init__(self, size=5, render_mode=None):
        self.size = size
        self.render_mode = render_mode

        # Define spaces
        self.observation_space = spaces.Dict({
            "agent": spaces.Box(0, size - 1, shape=(2,), dtype=int),
            "target": spaces.Box(0, size - 1, shape=(2,), dtype=int),
        })
        self.action_space = spaces.Discrete(4)  # up, right, down, left

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        self._agent_location = self.np_random.integers(0, self.size, size=2)
        self._target_location = self.np_random.integers(0, self.size, size=2)
        while np.array_equal(self._agent_location, self._target_location):
            self._target_location = self.np_random.integers(0, self.size, size=2)
        observation = {"agent": self._agent_location, "target": self._target_location}
        return observation, {}

    def step(self, action):
        direction = {0: np.array([0, 1]), 1: np.array([1, 0]),
                     2: np.array([0, -1]), 3: np.array([-1, 0])}
        self._agent_location = np.clip(
            self._agent_location + direction[action], 0, self.size - 1
        )
        terminated = np.array_equal(self._agent_location, self._target_location)
        reward = 1.0 if terminated else 0.0
        observation = {"agent": self._agent_location, "target": self._target_location}
        return observation, reward, terminated, False, {}

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment