Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Farama Foundation Gymnasium Env Step Reset

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MDP
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete tool for agent-environment interaction via step and reset methods provided by the Gymnasium Env base class.

Description

The gymnasium.Env abstract base class defines the standard RL environment interface. The step(action) method advances the environment by one timestep, returning a 5-tuple of (observation, reward, terminated, truncated, info). The reset() method initializes the environment to a starting state, returning a 2-tuple of (observation, info). The class is generic over ObsType and ActType for type-safe usage.

Usage

These methods are used in every RL interaction loop. Call reset() once at the start and after each episode ends (terminated or truncated). Call step(action) to advance the environment.

Code Reference

Source Location

  • Repository: Gymnasium
  • File: gymnasium/core.py
  • Lines: L22-281

Signature

class Env(Generic[ObsType, ActType]):
    # Required attributes
    action_space: spaces.Space[ActType]
    observation_space: spaces.Space[ObsType]
    metadata: dict[str, Any] = {"render_modes": []}
    render_mode: str | None = None

    def step(
        self, action: ActType
    ) -> tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]:
        """Run one timestep of the environment's dynamics.

        Args:
            action: An action provided by the agent.

        Returns:
            observation: Next observation.
            reward: Reward for taking the action.
            terminated: Whether the agent reached a terminal state (MDP).
            truncated: Whether the episode was truncated (e.g., time limit).
            info: Auxiliary diagnostic information.
        """

    def reset(
        self,
        *,
        seed: int | None = None,
        options: dict[str, Any] | None = None,
    ) -> tuple[ObsType, dict[str, Any]]:
        """Reset the environment to an initial state.

        Args:
            seed: Seed for the environment's PRNG.
            options: Additional reset options.

        Returns:
            observation: Initial observation.
            info: Auxiliary information.
        """

Import

import gymnasium as gym

env = gym.make("CartPole-v1")
# env inherits from gymnasium.Env

I/O Contract

Inputs

Name Type Required Description
action (step) ActType Yes Action to take in the environment
seed (reset) int or None No PRNG seed for reproducibility
options (reset) dict or None No Environment-specific reset options

Outputs

Name Type Description
step() returns tuple[ObsType, float, bool, bool, dict] (observation, reward, terminated, truncated, info)
reset() returns tuple[ObsType, dict] (initial_observation, info)

Usage Examples

Standard Interaction Loop

import gymnasium as gym

env = gym.make("CartPole-v1")
obs, info = env.reset(seed=42)

total_reward = 0
terminated, truncated = False, False

while not (terminated or truncated):
    action = env.action_space.sample()  # Random agent
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward

print(f"Episode reward: {total_reward}")
env.close()

Multi-Episode Training

import gymnasium as gym
import numpy as np

env = gym.make("Blackjack-v1")
rewards = []

for episode in range(1000):
    obs, info = env.reset()
    episode_reward = 0
    terminated, truncated = False, False

    while not (terminated or truncated):
        # Epsilon-greedy policy
        if np.random.random() < 0.1:
            action = env.action_space.sample()
        else:
            action = 0  # Stand (greedy)
        obs, reward, terminated, truncated, info = env.step(action)
        episode_reward += reward

    rewards.append(episode_reward)

env.close()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment