Implementation:Farama Foundation Gymnasium TaxiEnv

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Toy_Text_Environments
Last Updated	2026-02-15 03:00 GMT

Overview

The Taxi environment where a taxi navigates a 5x5 grid to pick up and deliver a passenger between four designated locations, registered as Taxi-v3, with support for rainy weather and fickle passengers.

Description

The TaxiEnv class implements the hierarchical RL taxi problem from Dietterich (2000). The environment features a 5x5 grid with four colored locations (Red, Green, Yellow, Blue), walls between certain cells, and a taxi that must navigate to pick up a passenger and deliver them to a destination.

State Encoding: 500 discrete states encoded as ((taxi_row * 5 + taxi_col) * 5 + passenger_location) * 4 + destination. Passenger locations 0-3 correspond to the four pickup points, and 4 means the passenger is in the taxi. Only 404 states are actually reachable during episodes.

Action Space: Discrete(6) with 0=south, 1=north, 2=east, 3=west, 4=pickup, 5=dropoff.

Rewards: -1 per step (default), +20 for correct dropoff, -10 for illegal pickup/dropoff. Movement into walls results in a -1 noop.

Action Masking: The environment provides an action_mask in the info dict indicating which actions will cause a state change, enabling masked action sampling via env.action_space.sample(info["action_mask"]).

Rainy Weather (is_rainy=True): When enabled, movement actions succeed with 80% probability and slip to perpendicular directions with 10% each, similar to the slippery mechanics in FrozenLake. Wall checks are applied to slipped directions as well.

Fickle Passenger (fickle_passenger=True): With 30% probability, the passenger changes their destination after being picked up and the taxi has moved one square away from the pickup location. This only happens once per episode.

Transition Model: Pre-computed in __init__ for all state-action pairs. The _build_dry_transitions method handles deterministic transitions while _build_rainy_transitions handles stochastic ones. Helper methods encode() and decode() convert between state integers and (row, col, pass_idx, dest_idx) tuples.

Rendering: Supports "human" (PyGame window), "rgb_array" (numpy pixel array), and "ansi" (colored text with taxi, passenger, and destination markers). PyGame rendering uses directional taxi sprites, passenger and hotel icons, and gridworld median borders.

Usage

Use this environment for tabular RL, hierarchical RL, and options-based approaches. The action mask enables safe exploration. Create via gymnasium.make("Taxi-v3").

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/toy_text/taxi.py

Signature

class TaxiEnv(Env):
    def __init__(
        self,
        render_mode: str | None = None,
        is_rainy: bool = False,
        fickle_passenger: bool = False,
    )
    def encode(self, taxi_row, taxi_col, pass_loc, dest_idx) -> int
    def decode(self, i) -> reversed
    def action_mask(self, state: int) -> np.ndarray
    def step(self, a) -> tuple[int, int, bool, bool, dict]
    def reset(self, *, seed: int | None = None, options: dict | None = None) -> tuple[int, dict]
    def render(self) -> str | np.ndarray | None

Import

import gymnasium as gym
env = gym.make("Taxi-v3")

# Direct import
from gymnasium.envs.toy_text.taxi import TaxiEnv

I/O Contract

Inputs

Name	Type	Required	Description
render_mode	str or None	No	"human", "rgb_array", or "ansi"
is_rainy	bool	No	Enable stochastic movement (80/10/10 probability, default False)
fickle_passenger	bool	No	Passenger may change destination (default False)
a	int (0-5)	Yes (step)	0=south, 1=north, 2=east, 3=west, 4=pickup, 5=dropoff

Outputs

Name	Type	Description
observation	int	Encoded state (0-499)
reward	int	-1 (step), +20 (correct dropoff), -10 (illegal action)
terminated	bool	True when passenger is correctly dropped off
truncated	bool	Always False (TimeLimit wrapper handles truncation)
info	dict	{"prob": float, "action_mask": np.ndarray} with transition probability and valid action mask

Usage Examples

import gymnasium as gym

env = gym.make("Taxi-v3")
obs, info = env.reset(seed=42)

# Decode state to understand position
taxi_row, taxi_col, pass_loc, dest_idx = env.unwrapped.decode(obs)
print(f"Taxi: ({taxi_row},{taxi_col}), Passenger: {pass_loc}, Dest: {dest_idx}")

# Use action mask for safe exploration
action = env.action_space.sample(info["action_mask"])
obs, reward, terminated, truncated, info = env.step(action)

# Text rendering
env_text = gym.make("Taxi-v3", render_mode="ansi")
obs, info = env_text.reset()
print(env_text.render())

env.close()

Related Pages

Environment:Farama_Foundation_Gymnasium_Python_3_10_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment