Implementation:Farama Foundation Gymnasium Env Step Reset
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MDP |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete tool for agent-environment interaction via step and reset methods provided by the Gymnasium Env base class.
Description
The gymnasium.Env abstract base class defines the standard RL environment interface. The step(action) method advances the environment by one timestep, returning a 5-tuple of (observation, reward, terminated, truncated, info). The reset() method initializes the environment to a starting state, returning a 2-tuple of (observation, info). The class is generic over ObsType and ActType for type-safe usage.
Usage
These methods are used in every RL interaction loop. Call reset() once at the start and after each episode ends (terminated or truncated). Call step(action) to advance the environment.
Code Reference
Source Location
- Repository: Gymnasium
- File: gymnasium/core.py
- Lines: L22-281
Signature
class Env(Generic[ObsType, ActType]):
# Required attributes
action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
metadata: dict[str, Any] = {"render_modes": []}
render_mode: str | None = None
def step(
self, action: ActType
) -> tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]:
"""Run one timestep of the environment's dynamics.
Args:
action: An action provided by the agent.
Returns:
observation: Next observation.
reward: Reward for taking the action.
terminated: Whether the agent reached a terminal state (MDP).
truncated: Whether the episode was truncated (e.g., time limit).
info: Auxiliary diagnostic information.
"""
def reset(
self,
*,
seed: int | None = None,
options: dict[str, Any] | None = None,
) -> tuple[ObsType, dict[str, Any]]:
"""Reset the environment to an initial state.
Args:
seed: Seed for the environment's PRNG.
options: Additional reset options.
Returns:
observation: Initial observation.
info: Auxiliary information.
"""
Import
import gymnasium as gym
env = gym.make("CartPole-v1")
# env inherits from gymnasium.Env
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action (step) | ActType | Yes | Action to take in the environment |
| seed (reset) | int or None | No | PRNG seed for reproducibility |
| options (reset) | dict or None | No | Environment-specific reset options |
Outputs
| Name | Type | Description |
|---|---|---|
| step() returns | tuple[ObsType, float, bool, bool, dict] | (observation, reward, terminated, truncated, info) |
| reset() returns | tuple[ObsType, dict] | (initial_observation, info) |
Usage Examples
Standard Interaction Loop
import gymnasium as gym
env = gym.make("CartPole-v1")
obs, info = env.reset(seed=42)
total_reward = 0
terminated, truncated = False, False
while not (terminated or truncated):
action = env.action_space.sample() # Random agent
obs, reward, terminated, truncated, info = env.step(action)
total_reward += reward
print(f"Episode reward: {total_reward}")
env.close()
Multi-Episode Training
import gymnasium as gym
import numpy as np
env = gym.make("Blackjack-v1")
rewards = []
for episode in range(1000):
obs, info = env.reset()
episode_reward = 0
terminated, truncated = False, False
while not (terminated or truncated):
# Epsilon-greedy policy
if np.random.random() < 0.1:
action = env.action_space.sample()
else:
action = 0 # Stand (greedy)
obs, reward, terminated, truncated, info = env.step(action)
episode_reward += reward
rewards.append(episode_reward)
env.close()