Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium TaxiEnv

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Toy_Text_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

The Taxi environment where a taxi navigates a 5x5 grid to pick up and deliver a passenger between four designated locations, registered as Taxi-v3, with support for rainy weather and fickle passengers.

Description

The TaxiEnv class implements the hierarchical RL taxi problem from Dietterich (2000). The environment features a 5x5 grid with four colored locations (Red, Green, Yellow, Blue), walls between certain cells, and a taxi that must navigate to pick up a passenger and deliver them to a destination.

State Encoding: 500 discrete states encoded as ((taxi_row * 5 + taxi_col) * 5 + passenger_location) * 4 + destination. Passenger locations 0-3 correspond to the four pickup points, and 4 means the passenger is in the taxi. Only 404 states are actually reachable during episodes.

Action Space: Discrete(6) with 0=south, 1=north, 2=east, 3=west, 4=pickup, 5=dropoff.

Rewards: -1 per step (default), +20 for correct dropoff, -10 for illegal pickup/dropoff. Movement into walls results in a -1 noop.

Action Masking: The environment provides an action_mask in the info dict indicating which actions will cause a state change, enabling masked action sampling via env.action_space.sample(info["action_mask"]).

Rainy Weather (is_rainy=True): When enabled, movement actions succeed with 80% probability and slip to perpendicular directions with 10% each, similar to the slippery mechanics in FrozenLake. Wall checks are applied to slipped directions as well.

Fickle Passenger (fickle_passenger=True): With 30% probability, the passenger changes their destination after being picked up and the taxi has moved one square away from the pickup location. This only happens once per episode.

Transition Model: Pre-computed in __init__ for all state-action pairs. The _build_dry_transitions method handles deterministic transitions while _build_rainy_transitions handles stochastic ones. Helper methods encode() and decode() convert between state integers and (row, col, pass_idx, dest_idx) tuples.

Rendering: Supports "human" (PyGame window), "rgb_array" (numpy pixel array), and "ansi" (colored text with taxi, passenger, and destination markers). PyGame rendering uses directional taxi sprites, passenger and hotel icons, and gridworld median borders.

Usage

Use this environment for tabular RL, hierarchical RL, and options-based approaches. The action mask enables safe exploration. Create via gymnasium.make("Taxi-v3").

Code Reference

Source Location

Signature

class TaxiEnv(Env):
    def __init__(
        self,
        render_mode: str | None = None,
        is_rainy: bool = False,
        fickle_passenger: bool = False,
    )
    def encode(self, taxi_row, taxi_col, pass_loc, dest_idx) -> int
    def decode(self, i) -> reversed
    def action_mask(self, state: int) -> np.ndarray
    def step(self, a) -> tuple[int, int, bool, bool, dict]
    def reset(self, *, seed: int | None = None, options: dict | None = None) -> tuple[int, dict]
    def render(self) -> str | np.ndarray | None

Import

import gymnasium as gym
env = gym.make("Taxi-v3")

# Direct import
from gymnasium.envs.toy_text.taxi import TaxiEnv

I/O Contract

Inputs

Name Type Required Description
render_mode str or None No "human", "rgb_array", or "ansi"
is_rainy bool No Enable stochastic movement (80/10/10 probability, default False)
fickle_passenger bool No Passenger may change destination (default False)
a int (0-5) Yes (step) 0=south, 1=north, 2=east, 3=west, 4=pickup, 5=dropoff

Outputs

Name Type Description
observation int Encoded state (0-499)
reward int -1 (step), +20 (correct dropoff), -10 (illegal action)
terminated bool True when passenger is correctly dropped off
truncated bool Always False (TimeLimit wrapper handles truncation)
info dict {"prob": float, "action_mask": np.ndarray} with transition probability and valid action mask

Usage Examples

import gymnasium as gym

env = gym.make("Taxi-v3")
obs, info = env.reset(seed=42)

# Decode state to understand position
taxi_row, taxi_col, pass_loc, dest_idx = env.unwrapped.decode(obs)
print(f"Taxi: ({taxi_row},{taxi_col}), Passenger: {pass_loc}, Dest: {dest_idx}")

# Use action mask for safe exploration
action = env.action_space.sample(info["action_mask"])
obs, reward, terminated, truncated, info = env.step(action)

# Text rendering
env_text = gym.make("Taxi-v3", render_mode="ansi")
obs, info = env_text.reset()
print(env_text.render())

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment