Implementation:Farama Foundation Gymnasium Env Subclass Interface
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Environment_Design |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete interface for building custom RL environments by subclassing gymnasium.Env provided by the Gymnasium library.
Description
The gymnasium.Env base class is a generic abstract class parameterized by ObsType and ActType. Custom environments must subclass it and implement __init__, reset, step, and optionally render and close. The class provides np_random (seeded PRNG), spec (EnvSpec from make()), and wrapper attribute traversal methods.
Usage
Subclass gymnasium.Env when creating any custom environment. Set observation_space and action_space in __init__, call super().reset(seed=seed) at the start of reset, and return the correct tuple types from step and reset.
Code Reference
Source Location
- Repository: Gymnasium
- File: gymnasium/core.py
- Lines: L22-281
Signature
class Env(Generic[ObsType, ActType]):
"""The main Gymnasium class for implementing RL environments."""
# Set in ALL subclasses
action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
# Set in SOME subclasses
metadata: dict[str, Any] = {"render_modes": []}
render_mode: str | None = None
def step(self, action: ActType) -> tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]:
"""Run one timestep of the environment's dynamics."""
raise NotImplementedError
def reset(self, *, seed: int | None = None, options: dict[str, Any] | None = None) -> tuple[ObsType, dict[str, Any]]:
"""Reset the environment to an initial state."""
if seed is not None:
self._np_random, self._np_random_seed = seeding.np_random(seed)
def render(self) -> RenderFrame | list[RenderFrame] | None:
"""Compute render frames as specified by render_mode."""
raise NotImplementedError
def close(self) -> None:
"""Clean up resources (rendering windows, connections)."""
pass
Import
import gymnasium as gym
from gymnasium import spaces
import numpy as np
class MyEnv(gym.Env):
...
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| observation_space | spaces.Space | Yes | Must be set in __init__ |
| action_space | spaces.Space | Yes | Must be set in __init__ |
| metadata | dict | Yes | Must include "render_modes" key |
| render_mode | str or None | No | Rendering mode from metadata |
Outputs
| Name | Type | Description |
|---|---|---|
| step() | tuple[ObsType, float, bool, bool, dict] | (obs, reward, terminated, truncated, info) |
| reset() | tuple[ObsType, dict] | (initial_obs, info) |
| render() | ndarray or None | Render frame or None |
Usage Examples
Grid World Environment
import gymnasium as gym
from gymnasium import spaces
import numpy as np
class GridWorldEnv(gym.Env):
metadata = {"render_modes": ["rgb_array"], "render_fps": 4}
def __init__(self, size=5, render_mode=None):
self.size = size
self.render_mode = render_mode
# Define spaces
self.observation_space = spaces.Dict({
"agent": spaces.Box(0, size - 1, shape=(2,), dtype=int),
"target": spaces.Box(0, size - 1, shape=(2,), dtype=int),
})
self.action_space = spaces.Discrete(4) # up, right, down, left
def reset(self, seed=None, options=None):
super().reset(seed=seed)
self._agent_location = self.np_random.integers(0, self.size, size=2)
self._target_location = self.np_random.integers(0, self.size, size=2)
while np.array_equal(self._agent_location, self._target_location):
self._target_location = self.np_random.integers(0, self.size, size=2)
observation = {"agent": self._agent_location, "target": self._target_location}
return observation, {}
def step(self, action):
direction = {0: np.array([0, 1]), 1: np.array([1, 0]),
2: np.array([0, -1]), 3: np.array([-1, 0])}
self._agent_location = np.clip(
self._agent_location + direction[action], 0, self.size - 1
)
terminated = np.array_equal(self._agent_location, self._target_location)
reward = 1.0 if terminated else 0.0
observation = {"agent": self._agent_location, "target": self._target_location}
return observation, reward, terminated, False, {}