Implementation:LaurentMazare Tch rs VecGymEnv

Knowledge Sources	LaurentMazare_Tch_rs
Domains	Reinforcement Learning, Python Interop, Environment Simulation
Last Updated	2026-02-08 00:00 GMT

Overview

Provides a Rust wrapper around vectorized OpenAI Gym environments for parallel reinforcement learning training, bridging Python Gym via the cpython crate.

Description

The VecGymEnv struct wraps a Python-based vectorized Gym environment, enabling multiple parallel environments to run simultaneously from Rust. It uses the cpython crate to acquire the Python GIL and call into Python's atari_wrappers module, which provides preprocessed Atari environments.

The wrapper exposes three key operations: new constructs the vectorized environment by importing the Python atari_wrappers module, extracting the action space size and observation space shape, and prepending the number of processes to the observation shape. reset returns the initial observations as a tch Tensor reshaped to match the observation space. step takes a vector of actions (one per parallel environment) and returns a Step struct containing observation, reward, and done tensors.

Data conversion happens at the boundary: Python numpy arrays are flattened and extracted as Rust vectors (Vec<u8> for observations via PyBuffer, Vec<f32> for rewards and done flags), then converted to tch::Tensor with appropriate shapes and types. Observations are cast to Float kind for neural network consumption.

Usage

Use this wrapper when implementing reinforcement learning algorithms that benefit from parallel environment execution for faster data collection. It requires a Python environment with OpenAI Gym and the custom atari_wrappers module available on the Python path.

Code Reference

Source Location

Repository: LaurentMazare_Tch_rs
File: examples/reinforcement-learning/vec_gym_env.rs
Lines: 1-66

Signature

#[derive(Debug)]
pub struct Step {
    pub obs: Tensor,
    pub reward: Tensor,
    pub is_done: Tensor,
}

pub struct VecGymEnv {
    env: PyObject,
    action_space: i64,
    observation_space: Vec<i64>,
}

impl VecGymEnv {
    pub fn new(name: &str, img_dir: Option<&str>, nprocesses: i64) -> PyResult<VecGymEnv>
    pub fn reset(&self) -> PyResult<Tensor>
    pub fn step(&self, action: Vec<i64>) -> PyResult<Step>
    pub fn action_space(&self) -> i64
    pub fn observation_space(&self) -> &[i64]
}

Import

use cpython::{buffer::PyBuffer, NoArgs, ObjectProtocol, PyObject, PyResult, Python};
use tch::Tensor;

I/O Contract

Input	Type	Description
name	&str	Gym environment name (e.g., "SpaceInvadersNoFrameskip-v4")
img_dir	Option<&str>	Optional directory for saving rendered frames
nprocesses	i64	Number of parallel environments to run
action	Vec<i64>	Vector of action indices, one per environment

Output	Type	Description
Step.obs	Tensor	Observation tensor shaped [nprocesses, ...observation_space]
Step.reward	Tensor	Reward tensor shaped [nprocesses]
Step.is_done	Tensor	Done flag tensor shaped [nprocesses] (1.0 if episode ended)
action_space()	i64	Number of discrete actions available
observation_space()	&[i64]	Shape of observations including nprocesses dimension

Usage Examples

use vec_gym_env::VecGymEnv;

// Create 8 parallel SpaceInvaders environments
let env = VecGymEnv::new("SpaceInvadersNoFrameskip-v4", None, 8)?;
println!("Actions: {}", env.action_space());
println!("Obs shape: {:?}", env.observation_space());

// Reset all environments
let obs = env.reset()?;

// Take a step with random actions
let actions = vec![0i64; 8]; // action for each environment
let step = env.step(actions)?;
// step.obs: next observations
// step.reward: rewards received
// step.is_done: episode termination flags

Related Pages

Principle:LaurentMazare_Tch_rs_Vectorized_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment