Implementation:Farama Foundation Gymnasium CliffWalkingEnv

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Toy_Text_Environments
Last Updated	2026-02-15 03:00 GMT

Overview

The standard (non-JAX) Cliff Walking environment implementing a 4x12 gridworld where an agent navigates from start to goal while avoiding a cliff, registered as CliffWalking-v1.

Description

The CliffWalkingEnv class implements the classic Cliff Walking problem as a standard Gymnasium Env subclass with pre-computed transition probabilities.

Environment Layout: A 4x12 grid where the player starts at state 36 (position [3, 0]) and the goal is at state 47 (position [3, 11]). A cliff occupies positions [3, 1] through [3, 10]. Stepping onto the cliff sends the player back to the start with -100 reward.

Transition Model: All transition probabilities are pre-computed in __init__ and stored in the self.P dictionary, structured as P[state][action] = [(probability, next_state, reward, terminated)]. This tabular format enables direct use with dynamic programming algorithms. The _calculate_transition_prob method computes transitions, handling boundary clipping and cliff detection.

Slippery Mode: When is_slippery=True, the agent moves in the intended direction with probability 1/3 and in each perpendicular direction with probability 1/3. In non-slippery mode (default), transitions are deterministic (probability 1.0). Two variants are registered: CliffWalking-v1 (non-slippery) and CliffWalkingSlippery-v1 (slippery).

Action Space: Discrete(4) with 0=up, 1=right, 2=down, 3=left.

Observation: Discrete(48) representing the player's position as row * 12 + col.

Rendering: Supports three render modes: "human" (PyGame window), "rgb_array" (numpy pixel array), and "ansi" (text grid with x=player, T=terminal, C=cliff, o=empty). PyGame rendering uses mountain-themed tile sprites with directional elf images.

Usage

Use this environment for classic tabular RL experiments such as SARSA, Q-learning, and policy iteration on the cliff walking problem. Create via gymnasium.make("CliffWalking-v1") or gymnasium.make("CliffWalkingSlippery-v1").

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/toy_text/cliffwalking.py

Signature

class CliffWalkingEnv(Env):
    def __init__(self, render_mode: str | None = None, is_slippery: bool = False)
    def step(self, a) -> tuple[int, int, bool, bool, dict]
    def reset(self, *, seed: int | None = None, options: dict | None = None) -> tuple[int, dict]
    def render(self) -> str | np.ndarray | None

Import

import gymnasium as gym
env = gym.make("CliffWalking-v1")

# Slippery variant
env = gym.make("CliffWalkingSlippery-v1")

# Direct import
from gymnasium.envs.toy_text.cliffwalking import CliffWalkingEnv

I/O Contract

Inputs

Name	Type	Required	Description
render_mode	str or None	No	"human", "rgb_array", or "ansi"
is_slippery	bool	No	Enable stochastic transitions (default False)
a	int (0-3)	Yes (step)	0=up, 1=right, 2=down, 3=left
seed	int or None	No	Seed for reset

Outputs

Name	Type	Description
observation	int	Grid position (0-47) as row * 12 + col
reward	int	-1 per step, -100 for cliff
terminated	bool	True when reaching state 47 (goal)
truncated	bool	Always False (TimeLimit wrapper handles truncation)
info	dict	{"prob": float} with transition probability

Usage Examples

import gymnasium as gym

# Non-slippery cliff walking
env = gym.make("CliffWalking-v1")
obs, info = env.reset(seed=42)
print(f"Start state: {obs}")  # 36

# Take optimal path along top
for action in [0, 0, 0] + [1] * 11 + [2, 2, 2]:  # up, right x11, down
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated:
        print(f"Reached goal! Final state: {obs}")
        break

# Access transition probabilities directly
print(env.P[36][1])  # Transitions from start going right
# [(1.0, 37, -100, False)]  -- falls off cliff, back to start

env.close()

Related Pages

Environment:Farama_Foundation_Gymnasium_Python_3_10_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment