Implementation:LaurentMazare Tch rs GymEnv

Knowledge Sources	LaurentMazare_Tch_rs
Domains	Reinforcement Learning, Python Interop, Game AI
Last Updated	2026-02-08 00:00 GMT

Overview

Provides a Rust wrapper around the OpenAI Gym Python API using cpython, enabling reinforcement learning agents to interact with Gym environments through tch tensors.

Description

This module implements a Rust interface to OpenAI Gym environments via the cpython crate for Python interop. It consists of two main types:

Step<A> struct: Represents the result of taking an action in the environment. It contains:

obs: A Tensor with the observation after the action.
action: The action that was taken (generic type A).
reward: A f64 reward value.
is_done: A bool indicating if the episode has terminated.
A helper method copy_with_obs creates a copy of the step with a different observation tensor.

GymEnv struct: Wraps a Python Gym environment object. Key methods:

new(name): Acquires the Python GIL, imports the gym module, calls gym.make(name), seeds it with 42, and extracts the action space size and observation space shape. Supports both discrete (via .n) and continuous (via .shape) action spaces.
reset(): Resets the environment and returns the initial observation as a Tensor by extracting a Vec<f32> from the Python object.
step(action): Applies an action (any type implementing ToPyObject + Copy), extracts the observation, reward, and done flag from the Python tuple, and returns a Step<A>.
action_space(): Returns the number of allowed actions.
observation_space(): Returns the shape of observation tensors.

Usage

Use this wrapper when building reinforcement learning agents in Rust that need to interact with OpenAI Gym environments. It bridges the gap between Python-based environments and Rust-based tch tensor computations.

Code Reference

Source Location

Repository: LaurentMazare_Tch_rs
File: examples/reinforcement-learning/gym_env.rs
Lines: 1-78

Signature

pub struct Step<A> {
    pub obs: Tensor,
    pub action: A,
    pub reward: f64,
    pub is_done: bool,
}

impl<A: Copy> Step<A> {
    pub fn copy_with_obs(&self, obs: &Tensor) -> Step<A>
}

pub struct GymEnv {
    env: PyObject,
    action_space: i64,
    observation_space: Vec<i64>,
}

impl GymEnv {
    pub fn new(name: &str) -> PyResult<GymEnv>
    pub fn reset(&self) -> PyResult<Tensor>
    pub fn step<A: ToPyObject + Copy>(&self, action: A) -> PyResult<Step<A>>
    pub fn action_space(&self) -> i64
    pub fn observation_space(&self) -> &[i64]
}

Import

// Module within the reinforcement-learning example.
use cpython::{NoArgs, ObjectProtocol, PyObject, PyResult, Python, ToPyObject};
use tch::Tensor;

I/O Contract

Inputs

Name	Type	Required	Description
name	&str	Yes	OpenAI Gym environment name (e.g., "CartPole-v0", "SpaceInvadersNoFrameskip-v4").
action	A: ToPyObject + Copy	Yes (for step)	The action to take in the environment (e.g., i64 for discrete action spaces).

Outputs

Name	Type	Description
GymEnv	struct	An initialized Gym environment wrapper ready for interaction.
Tensor	tch::Tensor	Observation tensor from reset or step, created from Vec<f32>.
Step<A>	struct	Contains observation tensor, action, reward (f64), and done flag (bool).
action_space	i64	Number of available actions in the environment.
observation_space	&[i64]	Shape of the observation tensor.

Usage Examples

use cpython::{NoArgs, ObjectProtocol, PyObject, PyResult, Python, ToPyObject};
use tch::Tensor;

// Create a CartPole environment
let env = GymEnv::new("CartPole-v0")?;
println!("action space: {:?}", env.action_space());
println!("observation space: {:?}", env.observation_space());

// Reset and get initial observation
let mut obs = env.reset()?;

// Run an episode
loop {
    let action = 0i64; // or choose action from policy
    let step = env.step(action)?;
    obs = step.obs;
    println!("reward: {}, done: {}", step.reward, step.is_done);
    if step.is_done {
        obs = env.reset()?;
        break;
    }
}

Related Pages

Principle:LaurentMazare_Tch_rs_Gym_Environment_Interface

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment