Implementation:LaurentMazare Tch rs Atari Wrappers

Overview

atari_wrappers.py is a Python module located at examples/reinforcement-learning/atari_wrappers.py (308 lines) in the tch-rs repository. It provides DeepMind-style Atari environment wrappers for deep reinforcement learning training. The module consolidates observation preprocessing, reward shaping, frame manipulation, and multi-process environment vectorization into a composable wrapper pipeline. These wrappers transform raw Atari game environments from OpenAI Gym into a form suitable for training deep RL agents such as A2C and PPO, following the conventions established in the DQN Nature paper and subsequent DeepMind publications.

The module is organized into four logical sections: Atari game wrappers, environment factory functions, a vectorized environment base class, and a subprocess-based parallel environment runner.

Code Reference

Environment Wrapper Classes

All wrapper classes extend gym.Wrapper (or a specialized subclass) and follow the standard OpenAI Gym wrapper protocol.

Class	Base Class	Signature	Purpose
NoopResetEnv	`gym.Wrapper`	`NoopResetEnv(env, noop_max=30)`	Samples initial states by executing a random number of no-op actions (action 0) on reset, between 1 and `noop_max`. Asserts that action 0 is 'NOOP'.
FireResetEnv	`gym.Wrapper`	`FireResetEnv(env)`	Takes FIRE action on reset for environments that remain idle until the player fires. Executes actions 1 (FIRE) and 2 after reset. Asserts that action 1 is 'FIRE' and at least 3 actions exist.
ImageSaver	`gym.Wrapper`	`ImageSaver(env, img_path, rank)`	Debug utility that saves each observed RGB frame as a PNG file to `img_path` with a filename pattern of `out{rank}-{count:05d}.png`.
EpisodicLifeEnv	`gym.Wrapper`	`EpisodicLifeEnv(env)`	Treats loss of a life as end-of-episode for training purposes, while only performing a true environment reset when all lives are exhausted. Tracks `lives` and `was_real_done` state.
MaxAndSkipEnv	`gym.Wrapper`	`MaxAndSkipEnv(env, skip=4)`	Repeats each action for `skip` frames, accumulates rewards, and returns the element-wise maximum of the last two observed frames (to handle Atari sprite flickering). Uses a deque of size 2 for the observation buffer.
ClipRewardEnv	`gym.RewardWrapper`	`ClipRewardEnv(env)`	Bins rewards to {+1, 0, -1} using `np.sign(reward)`.
WarpFrame	`gym.ObservationWrapper`	`WarpFrame(env)`	Converts RGB observations to grayscale using luminance weights (0.299, 0.587, 0.114) and resizes to 84x84 pixels using bilinear interpolation via PIL. Output shape is `(84, 84, 1)` with dtype uint8.
FrameStack	`gym.Wrapper`	`FrameStack(env, k)`	Stacks the last `k` single-channel frames along the channel axis (axis=2). Maintains a deque of size `k`. On reset, fills the buffer with `k` copies of the initial frame. Output shape is `(84, 84, k)`.
WrapPyTorch	`gym.ObservationWrapper`	`WrapPyTorch(env)`	Transposes observations from HWC format `(84, 84, C)` to CHW format `(C, 84, 84)` as expected by PyTorch convolutions. Also rescales the observation space to [0.0, 1.0] float32.

Factory and Composition Functions

def wrap_deepmind(env, episode_life=True, clip_rewards=True)

Composes wrappers in DeepMind order onto a NoFrameskip Atari environment:

EpisodicLifeEnv (if episode_life=True)
NoopResetEnv(noop_max=30)
MaxAndSkipEnv(skip=4)
FireResetEnv (if 'FIRE' is in the action meanings)
WarpFrame
ClipRewardEnv (if clip_rewards=True)

Note: Frame stacking is intentionally not included in wrap_deepmind and must be applied separately.

def make_env(env_id, img_dir, seed, rank)

Returns a thunk (zero-argument callable) that creates a single wrapped environment. The thunk:

Creates a Gym environment with gym.make(env_id)
Seeds it with seed + rank
Optionally wraps with ImageSaver if img_dir is not None
Applies wrap_deepmind
Applies WrapPyTorch for CHW format

def make(env_name, img_dir, num_processes)

Top-level entry point that creates a SubprocVecEnv with num_processes parallel environments. Uses seed 1337 as the base seed.

Vectorized Environment Classes

class VecEnv(object)

Abstract base class for vectorized environments. Defines the interface: step(vac), reset(), and close().

class SubprocVecEnv(VecEnv)

Runs multiple environments in parallel using Python's multiprocessing module. Each environment runs in its own subprocess, communicating with the main process via Pipe. Uses CloudpickleWrapper to serialize environment factory functions for cross-process transfer.

Key methods:

step(actions) -- sends actions to all subprocesses, collects results, returns stacked (obs, rewards, dones, infos) as NumPy arrays. Automatically resets environments on done.
reset() -- resets all environments, returns stacked observations.
close() -- sends close command to all workers and joins processes.
num_envs (property) -- returns the number of parallel environments.

Helper Classes

class CloudpickleWrapper(object)

Wraps an object so that multiprocessing uses cloudpickle for serialization instead of the default pickle, enabling serialization of lambda functions and closures passed to subprocess workers.

I/O Contract

Inputs

Environment ID: A string identifying an Atari NoFrameskip-v4 environment (e.g., "SpaceInvadersNoFrameskip-v4").
Actions: Integer actions for discrete Atari environments, or a NumPy array of actions for vectorized environments.
Configuration parameters: noop_max (int, default 30), skip (int, default 4), k for frame stacking, num_processes for parallelism.

Outputs

Single environment observations: NumPy uint8 array of shape (1, 84, 84) after WrapPyTorch (CHW format, single grayscale channel).
Vectorized observations: NumPy array of shape (num_processes, C, 84, 84) where C depends on frame stacking.
Rewards: Clipped to {-1, 0, +1} when clip_rewards=True.
Done flags: Boolean, triggered on life loss (not just game over) when episode_life=True.

Invariants

The environment ID must contain "NoFrameskip" -- the wrap_deepmind function asserts this.
Frame skipping and max-pooling are handled by MaxAndSkipEnv, not by the underlying Atari emulator.
Subprocess environments automatically reset when an episode ends.
FrameStack requires single-channel input frames (asserts shp[2] == 1).

Usage Examples

Creating a Single Wrapped Environment

import gym
from atari_wrappers import wrap_deepmind, FrameStack, WrapPyTorch

env = gym.make("BreakoutNoFrameskip-v4")
env = wrap_deepmind(env, episode_life=True, clip_rewards=True)
env = FrameStack(env, k=4)
env = WrapPyTorch(env)

obs = env.reset()   # shape: (4, 84, 84)
obs, reward, done, info = env.step(1)

Creating Parallel Environments for Training

from atari_wrappers import make

# Create 16 parallel Atari environments
envs = make("PongNoFrameskip-v4", img_dir=None, num_processes=16)

obs = envs.reset()           # shape: (16, 1, 84, 84)
obs, rewards, dones, infos = envs.step(actions)
envs.close()

Dependencies

gym -- OpenAI Gym for Atari environments
numpy -- array operations and frame stacking
PIL (Pillow) -- image resizing in WarpFrame
multiprocessing -- subprocess-based parallelism in SubprocVecEnv
collections.deque -- fixed-size buffers for frame history
cloudpickle -- serialization of closures for multiprocessing (used via CloudpickleWrapper)

Design Rationale

The wrapper composition pattern follows the decorator design: each wrapper addresses one specific preprocessing concern, and they compose cleanly by nesting. This matches the DeepMind Atari preprocessing pipeline from "Human-level control through deep reinforcement learning" (Mnih et al., 2015). The separation of wrap_deepmind from FrameStack allows flexibility -- some algorithms may want different stack depths or no stacking at all.

The SubprocVecEnv uses true OS-level parallelism (not threading) because Atari emulation is CPU-bound and the GIL would prevent effective parallelism with threads.

Related Pages

Principle:LaurentMazare_Tch_rs_Atari_Environment_Preprocessing -- The guiding principle behind Atari observation preprocessing for deep RL

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment