Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LaurentMazare Tch rs GymEnv

From Leeroopedia


Knowledge Sources
Domains Reinforcement Learning, Python Interop, Game AI
Last Updated 2026-02-08 00:00 GMT

Overview

Provides a Rust wrapper around the OpenAI Gym Python API using cpython, enabling reinforcement learning agents to interact with Gym environments through tch tensors.

Description

This module implements a Rust interface to OpenAI Gym environments via the cpython crate for Python interop. It consists of two main types:

Step<A> struct: Represents the result of taking an action in the environment. It contains:

  • obs: A Tensor with the observation after the action.
  • action: The action that was taken (generic type A).
  • reward: A f64 reward value.
  • is_done: A bool indicating if the episode has terminated.
  • A helper method copy_with_obs creates a copy of the step with a different observation tensor.

GymEnv struct: Wraps a Python Gym environment object. Key methods:

  • new(name): Acquires the Python GIL, imports the gym module, calls gym.make(name), seeds it with 42, and extracts the action space size and observation space shape. Supports both discrete (via .n) and continuous (via .shape) action spaces.
  • reset(): Resets the environment and returns the initial observation as a Tensor by extracting a Vec<f32> from the Python object.
  • step(action): Applies an action (any type implementing ToPyObject + Copy), extracts the observation, reward, and done flag from the Python tuple, and returns a Step<A>.
  • action_space(): Returns the number of allowed actions.
  • observation_space(): Returns the shape of observation tensors.

Usage

Use this wrapper when building reinforcement learning agents in Rust that need to interact with OpenAI Gym environments. It bridges the gap between Python-based environments and Rust-based tch tensor computations.

Code Reference

Source Location

Signature

pub struct Step<A> {
    pub obs: Tensor,
    pub action: A,
    pub reward: f64,
    pub is_done: bool,
}

impl<A: Copy> Step<A> {
    pub fn copy_with_obs(&self, obs: &Tensor) -> Step<A>
}

pub struct GymEnv {
    env: PyObject,
    action_space: i64,
    observation_space: Vec<i64>,
}

impl GymEnv {
    pub fn new(name: &str) -> PyResult<GymEnv>
    pub fn reset(&self) -> PyResult<Tensor>
    pub fn step<A: ToPyObject + Copy>(&self, action: A) -> PyResult<Step<A>>
    pub fn action_space(&self) -> i64
    pub fn observation_space(&self) -> &[i64]
}

Import

// Module within the reinforcement-learning example.
use cpython::{NoArgs, ObjectProtocol, PyObject, PyResult, Python, ToPyObject};
use tch::Tensor;

I/O Contract

Inputs

Name Type Required Description
name &str Yes OpenAI Gym environment name (e.g., "CartPole-v0", "SpaceInvadersNoFrameskip-v4").
action A: ToPyObject + Copy Yes (for step) The action to take in the environment (e.g., i64 for discrete action spaces).

Outputs

Name Type Description
GymEnv struct An initialized Gym environment wrapper ready for interaction.
Tensor tch::Tensor Observation tensor from reset or step, created from Vec<f32>.
Step<A> struct Contains observation tensor, action, reward (f64), and done flag (bool).
action_space i64 Number of available actions in the environment.
observation_space &[i64] Shape of the observation tensor.

Usage Examples

use cpython::{NoArgs, ObjectProtocol, PyObject, PyResult, Python, ToPyObject};
use tch::Tensor;

// Create a CartPole environment
let env = GymEnv::new("CartPole-v0")?;
println!("action space: {:?}", env.action_space());
println!("observation space: {:?}", env.observation_space());

// Reset and get initial observation
let mut obs = env.reset()?;

// Run an episode
loop {
    let action = 0i64; // or choose action from policy
    let step = env.step(action)?;
    obs = step.obs;
    println!("reward: {}, done: {}", step.reward, step.is_done);
    if step.is_done {
        obs = env.reset()?;
        break;
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment