Principle:Farama Foundation Gymnasium Custom Environment Implementation
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Environment_Design |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
The practice of creating new RL environments by subclassing a base environment class and implementing required interface methods.
Description
Custom Environment Implementation is the process of defining a new RL environment by inheriting from gymnasium.Env and implementing the required interface:
- __init__: Set observation_space, action_space, and metadata (including render_modes).
- reset(seed, options): Initialize state, call super().reset(seed=seed) for PRNG setup, return (observation, info).
- step(action): Apply action to state, compute reward and termination conditions, return (observation, reward, terminated, truncated, info).
- render(): Optionally return visual representation based on render_mode.
The implementation must satisfy several contracts:
- Observations must be contained in observation_space
- Actions must be accepted from action_space
- reset must be called before step
- terminated and truncated must be boolean values
- The info dictionary must be a dict
Usage
Use this principle when you need an environment that does not exist in the Gymnasium registry. Common use cases include custom game environments, robotics simulations, real-world system interfaces, and research environments with novel dynamics.
Theoretical Basis
The custom environment implements a Markov Decision Process (MDP) or Partially Observable MDP (POMDP):
# Required interface (abstract)
class CustomEnv(gymnasium.Env):
metadata = {"render_modes": ["human", "rgb_array"]}
def __init__(self, render_mode=None):
self.observation_space = define_obs_space()
self.action_space = define_action_space()
def reset(self, seed=None, options=None):
super().reset(seed=seed)
state = initial_state(self.np_random)
return observation(state), info
def step(self, action):
next_state = transition(state, action)
reward = reward_function(state, action, next_state)
terminated = is_terminal(next_state)
return observation(next_state), reward, terminated, False, info