Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LaurentMazare Tch rs VecGymEnv

From Leeroopedia


Knowledge Sources
Domains Reinforcement Learning, Python Interop, Environment Simulation
Last Updated 2026-02-08 00:00 GMT

Overview

Provides a Rust wrapper around vectorized OpenAI Gym environments for parallel reinforcement learning training, bridging Python Gym via the cpython crate.

Description

The VecGymEnv struct wraps a Python-based vectorized Gym environment, enabling multiple parallel environments to run simultaneously from Rust. It uses the cpython crate to acquire the Python GIL and call into Python's atari_wrappers module, which provides preprocessed Atari environments.

The wrapper exposes three key operations: new constructs the vectorized environment by importing the Python atari_wrappers module, extracting the action space size and observation space shape, and prepending the number of processes to the observation shape. reset returns the initial observations as a tch Tensor reshaped to match the observation space. step takes a vector of actions (one per parallel environment) and returns a Step struct containing observation, reward, and done tensors.

Data conversion happens at the boundary: Python numpy arrays are flattened and extracted as Rust vectors (Vec<u8> for observations via PyBuffer, Vec<f32> for rewards and done flags), then converted to tch::Tensor with appropriate shapes and types. Observations are cast to Float kind for neural network consumption.

Usage

Use this wrapper when implementing reinforcement learning algorithms that benefit from parallel environment execution for faster data collection. It requires a Python environment with OpenAI Gym and the custom atari_wrappers module available on the Python path.

Code Reference

Source Location

Signature

#[derive(Debug)]
pub struct Step {
    pub obs: Tensor,
    pub reward: Tensor,
    pub is_done: Tensor,
}

pub struct VecGymEnv {
    env: PyObject,
    action_space: i64,
    observation_space: Vec<i64>,
}

impl VecGymEnv {
    pub fn new(name: &str, img_dir: Option<&str>, nprocesses: i64) -> PyResult<VecGymEnv>
    pub fn reset(&self) -> PyResult<Tensor>
    pub fn step(&self, action: Vec<i64>) -> PyResult<Step>
    pub fn action_space(&self) -> i64
    pub fn observation_space(&self) -> &[i64]
}

Import

use cpython::{buffer::PyBuffer, NoArgs, ObjectProtocol, PyObject, PyResult, Python};
use tch::Tensor;

I/O Contract

Input Type Description
name &str Gym environment name (e.g., "SpaceInvadersNoFrameskip-v4")
img_dir Option<&str> Optional directory for saving rendered frames
nprocesses i64 Number of parallel environments to run
action Vec<i64> Vector of action indices, one per environment
Output Type Description
Step.obs Tensor Observation tensor shaped [nprocesses, ...observation_space]
Step.reward Tensor Reward tensor shaped [nprocesses]
Step.is_done Tensor Done flag tensor shaped [nprocesses] (1.0 if episode ended)
action_space() i64 Number of discrete actions available
observation_space() &[i64] Shape of observations including nprocesses dimension

Usage Examples

use vec_gym_env::VecGymEnv;

// Create 8 parallel SpaceInvaders environments
let env = VecGymEnv::new("SpaceInvadersNoFrameskip-v4", None, 8)?;
println!("Actions: {}", env.action_space());
println!("Obs shape: {:?}", env.observation_space());

// Reset all environments
let obs = env.reset()?;

// Take a step with random actions
let actions = vec![0i64; 8]; // action for each environment
let step = env.step(actions)?;
// step.obs: next observations
// step.reward: rewards received
// step.is_done: episode termination flags

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment