Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Farama Foundation Gymnasium Array Backend Conversion

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Framework_Interoperability
Last Updated 2026-02-15 03:00 GMT

Overview

Transparent conversion of environment inputs and outputs between different array computation backends enables seamless interoperability across deep learning frameworks.

Description

Array backend conversion addresses the practical challenge that reinforcement learning environments typically produce NumPy arrays, while modern RL algorithms may be implemented in JAX, PyTorch, or other array-compatible frameworks. Without conversion layers, developers must manually convert between array types at every environment boundary, leading to verbose and error-prone code. Array conversion wrappers automate this translation, transparently converting actions from the framework's native array type to the environment's expected type (typically NumPy), and converting observations, rewards, and other return values back to the framework's array type.

The conversion system supports the Array API standard, providing a general-purpose mechanism that works with any Array API-compatible framework. Specialized converters exist for common pairs: JAX-to-NumPy, NumPy-to-PyTorch, and JAX-to-PyTorch. Each converter recursively traverses nested data structures (dicts, tuples, lists) to convert all array elements, while leaving non-array types (strings, booleans, None values) unchanged. The converters also handle device placement, ensuring that converted tensors end up on the correct compute device (CPU, GPU).

Both single-environment and vectorized environment versions of the conversion wrappers are provided. The vectorized versions are particularly important because they allow batched observations from multiple parallel environments to be directly consumed by GPU-based neural networks without an intermediate CPU-to-GPU transfer step. This architecture enables zero-copy data pipelines in frameworks that support it.

Usage

Use array backend conversion wrappers when your RL algorithm is implemented in a different framework than the environment. Use JAX-to-NumPy when a JAX-based algorithm interacts with a standard NumPy environment. Use NumPy-to-PyTorch when feeding environment outputs directly into PyTorch models. Use the general ArrayConversion wrapper when working with Array API-compatible frameworks or when you need framework-agnostic conversion. Use the vector versions when running multiple environments in parallel for batch training.

Theoretical Basis

Array backend conversion is a structural transformation that preserves the mathematical content of the data while changing its representation. For an observation on represented as a NumPy array, the conversion to a PyTorch tensor is:

convert:numpy.ndarraytorch.Tensor

such that for all indices i: convert(o)[i]=o[i].

The conversion must be applied recursively to handle nested observation structures:

def convert(data, target_namespace, device=None):
    if is_array(data):
        return target_namespace.asarray(data, device=device)
    elif isinstance(data, dict):
        return {k: convert(v, target_namespace, device) for k, v in data.items()}
    elif isinstance(data, tuple):
        return tuple(convert(v, target_namespace, device) for v in data)
    elif isinstance(data, list):
        return [convert(v, target_namespace, device) for v in data]
    else:
        return data  # leave non-array types unchanged

The wrapper intercepts the environment interface at two points:

# On step/reset input (action conversion):
env_action = convert(agent_action, from=agent_framework, to=env_framework)

# On step/reset output (observation conversion):
agent_obs = convert(env_obs, from=env_framework, to=agent_framework)
agent_reward = convert(env_reward, from=env_framework, to=agent_framework)

The device parameter enables GPU placement: convert(o,device=cuda:0) places the resulting tensor directly on the specified GPU.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment