Implementation:LaurentMazare Tch rs VecGymEnv
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement Learning, Python Interop, Environment Simulation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Provides a Rust wrapper around vectorized OpenAI Gym environments for parallel reinforcement learning training, bridging Python Gym via the cpython crate.
Description
The VecGymEnv struct wraps a Python-based vectorized Gym environment, enabling multiple parallel environments to run simultaneously from Rust. It uses the cpython crate to acquire the Python GIL and call into Python's atari_wrappers module, which provides preprocessed Atari environments.
The wrapper exposes three key operations: new constructs the vectorized environment by importing the Python atari_wrappers module, extracting the action space size and observation space shape, and prepending the number of processes to the observation shape. reset returns the initial observations as a tch Tensor reshaped to match the observation space. step takes a vector of actions (one per parallel environment) and returns a Step struct containing observation, reward, and done tensors.
Data conversion happens at the boundary: Python numpy arrays are flattened and extracted as Rust vectors (Vec<u8> for observations via PyBuffer, Vec<f32> for rewards and done flags), then converted to tch::Tensor with appropriate shapes and types. Observations are cast to Float kind for neural network consumption.
Usage
Use this wrapper when implementing reinforcement learning algorithms that benefit from parallel environment execution for faster data collection. It requires a Python environment with OpenAI Gym and the custom atari_wrappers module available on the Python path.
Code Reference
Source Location
- Repository: LaurentMazare_Tch_rs
- File: examples/reinforcement-learning/vec_gym_env.rs
- Lines: 1-66
Signature
#[derive(Debug)]
pub struct Step {
pub obs: Tensor,
pub reward: Tensor,
pub is_done: Tensor,
}
pub struct VecGymEnv {
env: PyObject,
action_space: i64,
observation_space: Vec<i64>,
}
impl VecGymEnv {
pub fn new(name: &str, img_dir: Option<&str>, nprocesses: i64) -> PyResult<VecGymEnv>
pub fn reset(&self) -> PyResult<Tensor>
pub fn step(&self, action: Vec<i64>) -> PyResult<Step>
pub fn action_space(&self) -> i64
pub fn observation_space(&self) -> &[i64]
}
Import
use cpython::{buffer::PyBuffer, NoArgs, ObjectProtocol, PyObject, PyResult, Python};
use tch::Tensor;
I/O Contract
| Input | Type | Description |
|---|---|---|
| name | &str | Gym environment name (e.g., "SpaceInvadersNoFrameskip-v4") |
| img_dir | Option<&str> | Optional directory for saving rendered frames |
| nprocesses | i64 | Number of parallel environments to run |
| action | Vec<i64> | Vector of action indices, one per environment |
| Output | Type | Description |
|---|---|---|
| Step.obs | Tensor | Observation tensor shaped [nprocesses, ...observation_space] |
| Step.reward | Tensor | Reward tensor shaped [nprocesses] |
| Step.is_done | Tensor | Done flag tensor shaped [nprocesses] (1.0 if episode ended) |
| action_space() | i64 | Number of discrete actions available |
| observation_space() | &[i64] | Shape of observations including nprocesses dimension |
Usage Examples
use vec_gym_env::VecGymEnv;
// Create 8 parallel SpaceInvaders environments
let env = VecGymEnv::new("SpaceInvadersNoFrameskip-v4", None, 8)?;
println!("Actions: {}", env.action_space());
println!("Obs shape: {:?}", env.observation_space());
// Reset all environments
let obs = env.reset()?;
// Take a step with random actions
let actions = vec![0i64; 8]; // action for each environment
let step = env.step(actions)?;
// step.obs: next observations
// step.reward: rewards received
// step.is_done: episode termination flags