Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Farama Foundation Gymnasium Custom Environment Implementation

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Environment_Design
Last Updated 2026-02-15 03:00 GMT

Overview

The practice of creating new RL environments by subclassing a base environment class and implementing required interface methods.

Description

Custom Environment Implementation is the process of defining a new RL environment by inheriting from gymnasium.Env and implementing the required interface:

  • __init__: Set observation_space, action_space, and metadata (including render_modes).
  • reset(seed, options): Initialize state, call super().reset(seed=seed) for PRNG setup, return (observation, info).
  • step(action): Apply action to state, compute reward and termination conditions, return (observation, reward, terminated, truncated, info).
  • render(): Optionally return visual representation based on render_mode.

The implementation must satisfy several contracts:

  • Observations must be contained in observation_space
  • Actions must be accepted from action_space
  • reset must be called before step
  • terminated and truncated must be boolean values
  • The info dictionary must be a dict

Usage

Use this principle when you need an environment that does not exist in the Gymnasium registry. Common use cases include custom game environments, robotics simulations, real-world system interfaces, and research environments with novel dynamics.

Theoretical Basis

The custom environment implements a Markov Decision Process (MDP) or Partially Observable MDP (POMDP):

# Required interface (abstract)
class CustomEnv(gymnasium.Env):
    metadata = {"render_modes": ["human", "rgb_array"]}

    def __init__(self, render_mode=None):
        self.observation_space = define_obs_space()
        self.action_space = define_action_space()

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        state = initial_state(self.np_random)
        return observation(state), info

    def step(self, action):
        next_state = transition(state, action)
        reward = reward_function(state, action, next_state)
        terminated = is_terminal(next_state)
        return observation(next_state), reward, terminated, False, info

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment