Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Farama Foundation Gymnasium Batched Environment Interaction

From Leeroopedia
Revision as of 17:46, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Farama_Foundation_Gymnasium_Batched_Environment_Interaction.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, Parallelism
Last Updated 2026-02-15 03:00 GMT

Overview

An extension of the standard environment interaction protocol that operates on batches of observations and actions across multiple parallel environments.

Description

Batched Environment Interaction extends the single-environment step/reset protocol to vector environments. The key differences from single-environment interaction:

  • reset() returns observations of shape (num_envs, *obs_shape) instead of (*obs_shape)
  • step(actions) accepts actions of shape (num_envs, *act_shape) and returns batched observations, rewards, terminateds, truncateds, and infos
  • Autoreset: Sub-environments automatically reset when they terminate/truncate, with the new observation available at the next step

The batched interface enables efficient GPU utilization by processing all environment data in a single forward pass through the neural network.

Usage

Use this protocol when interacting with VectorEnv instances for deep RL training. The batched interface is used by A2C, PPO, and other on-policy algorithms that collect fixed-length rollouts from multiple environments.

Theoretical Basis

Batched MDP interaction:

{oi,ri,di,ti}i=1N=envs.step({ai}i=1N)

With automatic reset: when di=True, the next call to step uses the observation from the auto-reset for environment i.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment