Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium FrozenLakeEnv

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Toy_Text_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

The Frozen Lake environment where an agent navigates a grid of frozen tiles and holes to reach a goal, with configurable slipperiness, map layouts, and reward schedules, registered as FrozenLake-v1.

Description

The FrozenLakeEnv class implements a gridworld navigation problem where the player traverses a frozen lake from the start tile (S) to the goal tile (G), avoiding holes (H) on frozen tiles (F).

Map System: Two pre-defined maps are included: 4x4 (16 states) and 8x8 (64 states). Custom maps can be provided as lists of strings. The generate_random_map(size, p, seed) helper function creates random valid maps using DFS to guarantee a path from start to goal exists, with p controlling the probability of frozen tiles.

Slippery Physics: When is_slippery=True (default), the agent moves in the intended direction with success_rate probability (default 1/3) and in each perpendicular direction with (1 - success_rate) / 2 probability. This models the slippery nature of ice and makes the problem stochastic.

Transition Model: Pre-computed in __init__ and stored in self.P[state][action] as lists of (probability, next_state, reward, terminated) tuples. Reaching G or H tiles are terminal states. Walking into the grid boundary results in staying in place.

Reward Schedule: Configurable via reward_schedule=(goal_reward, hole_reward, frozen_reward) with default (1, 0, 0).

Action Space: Discrete(4) with 0=left, 1=down, 2=right, 3=up.

Observation: Discrete(nS) where nS = nrow * ncol. The state is the player's position as row * ncol + col.

Rendering: Supports "human" (PyGame window), "rgb_array" (numpy pixel array), and "ansi" (colored text grid). PyGame rendering uses ice, hole, cracked-hole, goal, start, and elf sprite tiles.

Usage

Use this environment for tabular RL experimentation, policy iteration, value iteration, Q-learning, and SARSA. The slippery default makes it a useful testbed for stochastic MDPs. Create via gymnasium.make("FrozenLake-v1") or gymnasium.make("FrozenLake8x8-v1").

Code Reference

Source Location

Signature

def generate_random_map(size: int = 8, p: float = 0.8, seed: int | None = None) -> list[str]

class FrozenLakeEnv(Env):
    def __init__(
        self,
        render_mode: str | None = None,
        desc: list[str] = None,
        map_name: str = "4x4",
        is_slippery: bool = True,
        success_rate: float = 1.0 / 3.0,
        reward_schedule: tuple[int, int, int] = (1, 0, 0),
    )
    def step(self, a) -> tuple[int, float, bool, bool, dict]
    def reset(self, *, seed: int | None = None, options: dict | None = None) -> tuple[int, dict]
    def render(self) -> str | np.ndarray | None

Import

import gymnasium as gym
env = gym.make("FrozenLake-v1")

# With custom map
from gymnasium.envs.toy_text.frozen_lake import generate_random_map
env = gym.make("FrozenLake-v1", desc=generate_random_map(size=12, p=0.9))

I/O Contract

Inputs

Name Type Required Description
render_mode str or None No "human", "rgb_array", or "ansi"
desc list[str] or None No Custom map as list of strings (S=start, G=goal, F=frozen, H=hole)
map_name str No Pre-defined map name: "4x4" or "8x8" (default "4x4")
is_slippery bool No Enable stochastic transitions (default True)
success_rate float No Probability of moving in intended direction (default 1/3)
reward_schedule tuple[int,int,int] No (goal, hole, frozen) rewards (default (1, 0, 0))
a int (0-3) Yes (step) 0=left, 1=down, 2=right, 3=up

Outputs

Name Type Description
observation int Grid position (0 to nS-1)
reward float Depends on reward_schedule (default: 1 for goal, 0 otherwise)
terminated bool True when reaching G (goal) or H (hole)
truncated bool Always False (TimeLimit wrapper handles truncation)
info dict {"prob": float} with transition probability

Usage Examples

import gymnasium as gym

# Default 4x4 slippery lake
env = gym.make("FrozenLake-v1")
obs, info = env.reset(seed=42)

# Non-slippery for deterministic testing
env = gym.make("FrozenLake-v1", is_slippery=False)
obs, info = env.reset(seed=42)
obs, reward, terminated, truncated, info = env.step(2)  # Move right
print(f"State: {obs}, Prob: {info['prob']}")  # Probability is 1.0

# Custom reward schedule penalizing holes
env = gym.make("FrozenLake-v1", reward_schedule=(10, -5, -0.1))

# Random 12x12 map
from gymnasium.envs.toy_text.frozen_lake import generate_random_map
env = gym.make("FrozenLake-v1", desc=generate_random_map(size=12, p=0.85, seed=42))

# Access transition model for planning
print(env.unwrapped.P[0][2])  # Transitions from state 0 going right

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment