Implementation:Farama Foundation Gymnasium CliffWalkingEnv
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Toy_Text_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
The standard (non-JAX) Cliff Walking environment implementing a 4x12 gridworld where an agent navigates from start to goal while avoiding a cliff, registered as CliffWalking-v1.
Description
The CliffWalkingEnv class implements the classic Cliff Walking problem as a standard Gymnasium Env subclass with pre-computed transition probabilities.
Environment Layout: A 4x12 grid where the player starts at state 36 (position [3, 0]) and the goal is at state 47 (position [3, 11]). A cliff occupies positions [3, 1] through [3, 10]. Stepping onto the cliff sends the player back to the start with -100 reward.
Transition Model: All transition probabilities are pre-computed in __init__ and stored in the self.P dictionary, structured as P[state][action] = [(probability, next_state, reward, terminated)]. This tabular format enables direct use with dynamic programming algorithms. The _calculate_transition_prob method computes transitions, handling boundary clipping and cliff detection.
Slippery Mode: When is_slippery=True, the agent moves in the intended direction with probability 1/3 and in each perpendicular direction with probability 1/3. In non-slippery mode (default), transitions are deterministic (probability 1.0). Two variants are registered: CliffWalking-v1 (non-slippery) and CliffWalkingSlippery-v1 (slippery).
Action Space: Discrete(4) with 0=up, 1=right, 2=down, 3=left.
Observation: Discrete(48) representing the player's position as row * 12 + col.
Rendering: Supports three render modes: "human" (PyGame window), "rgb_array" (numpy pixel array), and "ansi" (text grid with x=player, T=terminal, C=cliff, o=empty). PyGame rendering uses mountain-themed tile sprites with directional elf images.
Usage
Use this environment for classic tabular RL experiments such as SARSA, Q-learning, and policy iteration on the cliff walking problem. Create via gymnasium.make("CliffWalking-v1") or gymnasium.make("CliffWalkingSlippery-v1").
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File:
gymnasium/envs/toy_text/cliffwalking.py
Signature
class CliffWalkingEnv(Env):
def __init__(self, render_mode: str | None = None, is_slippery: bool = False)
def step(self, a) -> tuple[int, int, bool, bool, dict]
def reset(self, *, seed: int | None = None, options: dict | None = None) -> tuple[int, dict]
def render(self) -> str | np.ndarray | None
Import
import gymnasium as gym
env = gym.make("CliffWalking-v1")
# Slippery variant
env = gym.make("CliffWalkingSlippery-v1")
# Direct import
from gymnasium.envs.toy_text.cliffwalking import CliffWalkingEnv
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| render_mode | str or None | No | "human", "rgb_array", or "ansi" |
| is_slippery | bool | No | Enable stochastic transitions (default False) |
| a | int (0-3) | Yes (step) | 0=up, 1=right, 2=down, 3=left |
| seed | int or None | No | Seed for reset |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | int | Grid position (0-47) as row * 12 + col |
| reward | int | -1 per step, -100 for cliff |
| terminated | bool | True when reaching state 47 (goal) |
| truncated | bool | Always False (TimeLimit wrapper handles truncation) |
| info | dict | {"prob": float} with transition probability |
Usage Examples
import gymnasium as gym
# Non-slippery cliff walking
env = gym.make("CliffWalking-v1")
obs, info = env.reset(seed=42)
print(f"Start state: {obs}") # 36
# Take optimal path along top
for action in [0, 0, 0] + [1] * 11 + [2, 2, 2]: # up, right x11, down
obs, reward, terminated, truncated, info = env.step(action)
if terminated:
print(f"Reached goal! Final state: {obs}")
break
# Access transition probabilities directly
print(env.P[36][1]) # Transitions from start going right
# [(1.0, 37, -100, False)] -- falls off cliff, back to start
env.close()