Principle:Farama Foundation Gymnasium Batched Environment Interaction

Knowledge Sources	Gymnasium Vector Envs
Domains	Reinforcement_Learning, Parallelism
Last Updated	2026-02-15 03:00 GMT

Overview

An extension of the standard environment interaction protocol that operates on batches of observations and actions across multiple parallel environments.

Description

Batched Environment Interaction extends the single-environment step/reset protocol to vector environments. The key differences from single-environment interaction:

reset() returns observations of shape (num_envs, *obs_shape) instead of (*obs_shape)
step(actions) accepts actions of shape (num_envs, *act_shape) and returns batched observations, rewards, terminateds, truncateds, and infos
Autoreset: Sub-environments automatically reset when they terminate/truncate, with the new observation available at the next step

The batched interface enables efficient GPU utilization by processing all environment data in a single forward pass through the neural network.

Usage

Use this protocol when interacting with VectorEnv instances for deep RL training. The batched interface is used by A2C, PPO, and other on-policy algorithms that collect fixed-length rollouts from multiple environments.

Theoretical Basis

Batched MDP interaction:

${o_{i}, r_{i}, d_{i}, t_{i}}_{i = 1}^{N} = envs.step ({a_{i}}_{i = 1}^{N})$

With automatic reset: when $d_{i} = True$ , the next call to step uses the observation from the auto-reset for environment $i$ .

Related Pages

Implemented By

Implementation:Farama_Foundation_Gymnasium_VectorEnv_Step_Reset

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment