Implementation:Farama Foundation Gymnasium Env Step Reset

Knowledge Sources	Gymnasium Gymnasium Env API
Domains	Reinforcement_Learning, MDP
Last Updated	2026-02-15 03:00 GMT

Overview

Concrete tool for agent-environment interaction via step and reset methods provided by the Gymnasium Env base class.

Description

The gymnasium.Env abstract base class defines the standard RL environment interface. The step(action) method advances the environment by one timestep, returning a 5-tuple of (observation, reward, terminated, truncated, info). The reset() method initializes the environment to a starting state, returning a 2-tuple of (observation, info). The class is generic over ObsType and ActType for type-safe usage.

Usage

These methods are used in every RL interaction loop. Call reset() once at the start and after each episode ends (terminated or truncated). Call step(action) to advance the environment.

Code Reference

Source Location

Repository: Gymnasium
File: gymnasium/core.py
Lines: L22-281

Signature

class Env(Generic[ObsType, ActType]):
    # Required attributes
    action_space: spaces.Space[ActType]
    observation_space: spaces.Space[ObsType]
    metadata: dict[str, Any] = {"render_modes": []}
    render_mode: str | None = None

    def step(
        self, action: ActType
    ) -> tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]:
        """Run one timestep of the environment's dynamics.

        Args:
            action: An action provided by the agent.

        Returns:
            observation: Next observation.
            reward: Reward for taking the action.
            terminated: Whether the agent reached a terminal state (MDP).
            truncated: Whether the episode was truncated (e.g., time limit).
            info: Auxiliary diagnostic information.
        """

    def reset(
        self,
        *,
        seed: int | None = None,
        options: dict[str, Any] | None = None,
    ) -> tuple[ObsType, dict[str, Any]]:
        """Reset the environment to an initial state.

        Args:
            seed: Seed for the environment's PRNG.
            options: Additional reset options.

        Returns:
            observation: Initial observation.
            info: Auxiliary information.
        """

Import

import gymnasium as gym

env = gym.make("CartPole-v1")
# env inherits from gymnasium.Env

I/O Contract

Inputs

Name	Type	Required	Description
action (step)	ActType	Yes	Action to take in the environment
seed (reset)	int or None	No	PRNG seed for reproducibility
options (reset)	dict or None	No	Environment-specific reset options

Outputs

Name	Type	Description
step() returns	tuple[ObsType, float, bool, bool, dict]	(observation, reward, terminated, truncated, info)
reset() returns	tuple[ObsType, dict]	(initial_observation, info)

Usage Examples

Standard Interaction Loop

import gymnasium as gym

env = gym.make("CartPole-v1")
obs, info = env.reset(seed=42)

total_reward = 0
terminated, truncated = False, False

while not (terminated or truncated):
    action = env.action_space.sample()  # Random agent
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward

print(f"Episode reward: {total_reward}")
env.close()

Multi-Episode Training

import gymnasium as gym
import numpy as np

env = gym.make("Blackjack-v1")
rewards = []

for episode in range(1000):
    obs, info = env.reset()
    episode_reward = 0
    terminated, truncated = False, False

    while not (terminated or truncated):
        # Epsilon-greedy policy
        if np.random.random() < 0.1:
            action = env.action_space.sample()
        else:
            action = 0  # Stand (greedy)
        obs, reward, terminated, truncated, info = env.step(action)
        episode_reward += reward

    rewards.append(episode_reward)

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment