Principle:Farama Foundation Gymnasium Batched Environment Interaction
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Parallelism |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
An extension of the standard environment interaction protocol that operates on batches of observations and actions across multiple parallel environments.
Description
Batched Environment Interaction extends the single-environment step/reset protocol to vector environments. The key differences from single-environment interaction:
- reset() returns observations of shape (num_envs, *obs_shape) instead of (*obs_shape)
- step(actions) accepts actions of shape (num_envs, *act_shape) and returns batched observations, rewards, terminateds, truncateds, and infos
- Autoreset: Sub-environments automatically reset when they terminate/truncate, with the new observation available at the next step
The batched interface enables efficient GPU utilization by processing all environment data in a single forward pass through the neural network.
Usage
Use this protocol when interacting with VectorEnv instances for deep RL training. The batched interface is used by A2C, PPO, and other on-policy algorithms that collect fixed-length rollouts from multiple environments.
Theoretical Basis
Batched MDP interaction:
With automatic reset: when , the next call to step uses the observation from the auto-reset for environment .