Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LaurentMazare Tch rs Atari Wrappers

From Leeroopedia


Overview

atari_wrappers.py is a Python module located at examples/reinforcement-learning/atari_wrappers.py (308 lines) in the tch-rs repository. It provides DeepMind-style Atari environment wrappers for deep reinforcement learning training. The module consolidates observation preprocessing, reward shaping, frame manipulation, and multi-process environment vectorization into a composable wrapper pipeline. These wrappers transform raw Atari game environments from OpenAI Gym into a form suitable for training deep RL agents such as A2C and PPO, following the conventions established in the DQN Nature paper and subsequent DeepMind publications.

The module is organized into four logical sections: Atari game wrappers, environment factory functions, a vectorized environment base class, and a subprocess-based parallel environment runner.

Code Reference

Environment Wrapper Classes

All wrapper classes extend gym.Wrapper (or a specialized subclass) and follow the standard OpenAI Gym wrapper protocol.

Class Base Class Signature Purpose
NoopResetEnv gym.Wrapper NoopResetEnv(env, noop_max=30) Samples initial states by executing a random number of no-op actions (action 0) on reset, between 1 and noop_max. Asserts that action 0 is 'NOOP'.
FireResetEnv gym.Wrapper FireResetEnv(env) Takes FIRE action on reset for environments that remain idle until the player fires. Executes actions 1 (FIRE) and 2 after reset. Asserts that action 1 is 'FIRE' and at least 3 actions exist.
ImageSaver gym.Wrapper ImageSaver(env, img_path, rank) Debug utility that saves each observed RGB frame as a PNG file to img_path with a filename pattern of out{rank}-{count:05d}.png.
EpisodicLifeEnv gym.Wrapper EpisodicLifeEnv(env) Treats loss of a life as end-of-episode for training purposes, while only performing a true environment reset when all lives are exhausted. Tracks lives and was_real_done state.
MaxAndSkipEnv gym.Wrapper MaxAndSkipEnv(env, skip=4) Repeats each action for skip frames, accumulates rewards, and returns the element-wise maximum of the last two observed frames (to handle Atari sprite flickering). Uses a deque of size 2 for the observation buffer.
ClipRewardEnv gym.RewardWrapper ClipRewardEnv(env) Bins rewards to {+1, 0, -1} using np.sign(reward).
WarpFrame gym.ObservationWrapper WarpFrame(env) Converts RGB observations to grayscale using luminance weights (0.299, 0.587, 0.114) and resizes to 84x84 pixels using bilinear interpolation via PIL. Output shape is (84, 84, 1) with dtype uint8.
FrameStack gym.Wrapper FrameStack(env, k) Stacks the last k single-channel frames along the channel axis (axis=2). Maintains a deque of size k. On reset, fills the buffer with k copies of the initial frame. Output shape is (84, 84, k).
WrapPyTorch gym.ObservationWrapper WrapPyTorch(env) Transposes observations from HWC format (84, 84, C) to CHW format (C, 84, 84) as expected by PyTorch convolutions. Also rescales the observation space to [0.0, 1.0] float32.

Factory and Composition Functions

def wrap_deepmind(env, episode_life=True, clip_rewards=True)

Composes wrappers in DeepMind order onto a NoFrameskip Atari environment:

  1. EpisodicLifeEnv (if episode_life=True)
  2. NoopResetEnv(noop_max=30)
  3. MaxAndSkipEnv(skip=4)
  4. FireResetEnv (if 'FIRE' is in the action meanings)
  5. WarpFrame
  6. ClipRewardEnv (if clip_rewards=True)

Note: Frame stacking is intentionally not included in wrap_deepmind and must be applied separately.

def make_env(env_id, img_dir, seed, rank)

Returns a thunk (zero-argument callable) that creates a single wrapped environment. The thunk:

  1. Creates a Gym environment with gym.make(env_id)
  2. Seeds it with seed + rank
  3. Optionally wraps with ImageSaver if img_dir is not None
  4. Applies wrap_deepmind
  5. Applies WrapPyTorch for CHW format
def make(env_name, img_dir, num_processes)

Top-level entry point that creates a SubprocVecEnv with num_processes parallel environments. Uses seed 1337 as the base seed.

Vectorized Environment Classes

class VecEnv(object)

Abstract base class for vectorized environments. Defines the interface: step(vac), reset(), and close().

class SubprocVecEnv(VecEnv)

Runs multiple environments in parallel using Python's multiprocessing module. Each environment runs in its own subprocess, communicating with the main process via Pipe. Uses CloudpickleWrapper to serialize environment factory functions for cross-process transfer.

Key methods:

  • step(actions) -- sends actions to all subprocesses, collects results, returns stacked (obs, rewards, dones, infos) as NumPy arrays. Automatically resets environments on done.
  • reset() -- resets all environments, returns stacked observations.
  • close() -- sends close command to all workers and joins processes.
  • num_envs (property) -- returns the number of parallel environments.

Helper Classes

class CloudpickleWrapper(object)

Wraps an object so that multiprocessing uses cloudpickle for serialization instead of the default pickle, enabling serialization of lambda functions and closures passed to subprocess workers.

I/O Contract

Inputs

  • Environment ID: A string identifying an Atari NoFrameskip-v4 environment (e.g., "SpaceInvadersNoFrameskip-v4").
  • Actions: Integer actions for discrete Atari environments, or a NumPy array of actions for vectorized environments.
  • Configuration parameters: noop_max (int, default 30), skip (int, default 4), k for frame stacking, num_processes for parallelism.

Outputs

  • Single environment observations: NumPy uint8 array of shape (1, 84, 84) after WrapPyTorch (CHW format, single grayscale channel).
  • Vectorized observations: NumPy array of shape (num_processes, C, 84, 84) where C depends on frame stacking.
  • Rewards: Clipped to {-1, 0, +1} when clip_rewards=True.
  • Done flags: Boolean, triggered on life loss (not just game over) when episode_life=True.

Invariants

  • The environment ID must contain "NoFrameskip" -- the wrap_deepmind function asserts this.
  • Frame skipping and max-pooling are handled by MaxAndSkipEnv, not by the underlying Atari emulator.
  • Subprocess environments automatically reset when an episode ends.
  • FrameStack requires single-channel input frames (asserts shp[2] == 1).

Usage Examples

Creating a Single Wrapped Environment

import gym
from atari_wrappers import wrap_deepmind, FrameStack, WrapPyTorch

env = gym.make("BreakoutNoFrameskip-v4")
env = wrap_deepmind(env, episode_life=True, clip_rewards=True)
env = FrameStack(env, k=4)
env = WrapPyTorch(env)

obs = env.reset()   # shape: (4, 84, 84)
obs, reward, done, info = env.step(1)

Creating Parallel Environments for Training

from atari_wrappers import make

# Create 16 parallel Atari environments
envs = make("PongNoFrameskip-v4", img_dir=None, num_processes=16)

obs = envs.reset()           # shape: (16, 1, 84, 84)
obs, rewards, dones, infos = envs.step(actions)
envs.close()

Dependencies

  • gym -- OpenAI Gym for Atari environments
  • numpy -- array operations and frame stacking
  • PIL (Pillow) -- image resizing in WarpFrame
  • multiprocessing -- subprocess-based parallelism in SubprocVecEnv
  • collections.deque -- fixed-size buffers for frame history
  • cloudpickle -- serialization of closures for multiprocessing (used via CloudpickleWrapper)

Design Rationale

The wrapper composition pattern follows the decorator design: each wrapper addresses one specific preprocessing concern, and they compose cleanly by nesting. This matches the DeepMind Atari preprocessing pipeline from "Human-level control through deep reinforcement learning" (Mnih et al., 2015). The separation of wrap_deepmind from FrameStack allows flexibility -- some algorithms may want different stack depths or no stacking at all.

The SubprocVecEnv uses true OS-level parallelism (not threading) because Atari emulation is CPU-bound and the GIL would prevent effective parallelism with threads.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment