Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Farama Foundation Gymnasium Atari Preprocessing

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Image_Preprocessing
Last Updated 2026-02-15 03:00 GMT

Overview

A standard preprocessing pipeline for Atari 2600 game environments applies frame skipping, max-pooling, grayscale conversion, and resizing to produce efficient, consistent inputs for learning agents.

Description

Atari preprocessing implements the standard set of image processing steps that have become the de facto pipeline for training RL agents on Atari 2600 games. This pipeline was formalized by Machado et al. (2018) in "Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents" and builds upon the original preprocessing used by Mnih et al. (2015) in the seminal DQN paper. The preprocessing converts the raw 210x160 RGB Atari frames into compact, normalized representations suitable for convolutional neural network input.

The pipeline consists of several sequential stages. No-op reset introduces stochasticity at the beginning of each episode by executing a random number of no-operation actions. Frame skipping (typically 4 frames) reduces temporal redundancy by having the agent observe and act only every k-th frame, with the intermediate frames using the same action. Max-pooling over the last two raw frames eliminates flickering artifacts caused by the Atari hardware's sprite rendering limitations. Grayscale conversion reduces the 3-channel RGB image to a single channel. Resizing reduces the spatial dimensions (typically to 84x84 pixels) to decrease computational cost. Optional scaling normalizes pixel values to the [0, 1) range.

This preprocessing has become so standard that virtually all Atari RL benchmarks use it, making results directly comparable across papers. The pipeline is designed to be combined with frame stacking (provided by a separate wrapper) to give the agent a short temporal history for velocity estimation. Terminal-on-life-loss is an optional setting that signals episode termination when a life is lost, though Machado et al. do not recommend it.

Usage

Use Atari preprocessing whenever training RL agents on Atari 2600 games to ensure compatibility with standard benchmarks. Combine it with frame stacking (FrameStackObservation wrapper) for temporal context. Use the default settings (frame_skip=4, screen_size=84, grayscale=True) unless the specific experiment requires deviations. Set terminal_on_life_loss=True only if the experimental protocol explicitly requires it. Use scale_obs=True when the learning algorithm expects normalized inputs.

Theoretical Basis

The preprocessing pipeline can be described as a sequence of transformations applied to raw Atari frames:

No-op reset: At episode start, execute Failed to parse (syntax error): {\displaystyle k \sim \text{Uniform}(0, n_{\text{noop\_max}})} no-operation actions.

Frame skipping: The agent selects an action every k frames. The same action is repeated for k consecutive simulator steps:

Failed to parse (syntax error): {\displaystyle o_t = \text{max\_pool}(f_{t \cdot k - 1}, f_{t \cdot k})}

where fi denotes the raw frame at simulator step i and max_pool takes the element-wise maximum to handle sprite flickering.

Grayscale conversion:

ogray=0.299R+0.587G+0.114B

Resizing via bilinear interpolation:

oresized=resize(ogray,(h,w))

where (h,w) is typically (84,84).

Optional scaling:

oscaled=oresized255.0

# Full preprocessing pipeline per step
raw_frame_1 = env.step(action)  # frame at t*k - 1
raw_frame_2 = env.step(action)  # frame at t*k
pooled = np.maximum(raw_frame_1, raw_frame_2)
gray = cv2.cvtColor(pooled, cv2.COLOR_RGB2GRAY)
resized = cv2.resize(gray, (84, 84), interpolation=cv2.INTER_AREA)
if scale_obs:
    observation = resized / 255.0

The resulting observation shape is (84,84) for grayscale or (84,84,3) for RGB, compared to the original (210,160,3) -- a 9.4x reduction in input dimensionality.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment