Implementation:Farama Foundation Gymnasium TaxiEnv
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Toy_Text_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
The Taxi environment where a taxi navigates a 5x5 grid to pick up and deliver a passenger between four designated locations, registered as Taxi-v3, with support for rainy weather and fickle passengers.
Description
The TaxiEnv class implements the hierarchical RL taxi problem from Dietterich (2000). The environment features a 5x5 grid with four colored locations (Red, Green, Yellow, Blue), walls between certain cells, and a taxi that must navigate to pick up a passenger and deliver them to a destination.
State Encoding: 500 discrete states encoded as ((taxi_row * 5 + taxi_col) * 5 + passenger_location) * 4 + destination. Passenger locations 0-3 correspond to the four pickup points, and 4 means the passenger is in the taxi. Only 404 states are actually reachable during episodes.
Action Space: Discrete(6) with 0=south, 1=north, 2=east, 3=west, 4=pickup, 5=dropoff.
Rewards: -1 per step (default), +20 for correct dropoff, -10 for illegal pickup/dropoff. Movement into walls results in a -1 noop.
Action Masking: The environment provides an action_mask in the info dict indicating which actions will cause a state change, enabling masked action sampling via env.action_space.sample(info["action_mask"]).
Rainy Weather (is_rainy=True): When enabled, movement actions succeed with 80% probability and slip to perpendicular directions with 10% each, similar to the slippery mechanics in FrozenLake. Wall checks are applied to slipped directions as well.
Fickle Passenger (fickle_passenger=True): With 30% probability, the passenger changes their destination after being picked up and the taxi has moved one square away from the pickup location. This only happens once per episode.
Transition Model: Pre-computed in __init__ for all state-action pairs. The _build_dry_transitions method handles deterministic transitions while _build_rainy_transitions handles stochastic ones. Helper methods encode() and decode() convert between state integers and (row, col, pass_idx, dest_idx) tuples.
Rendering: Supports "human" (PyGame window), "rgb_array" (numpy pixel array), and "ansi" (colored text with taxi, passenger, and destination markers). PyGame rendering uses directional taxi sprites, passenger and hotel icons, and gridworld median borders.
Usage
Use this environment for tabular RL, hierarchical RL, and options-based approaches. The action mask enables safe exploration. Create via gymnasium.make("Taxi-v3").
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File:
gymnasium/envs/toy_text/taxi.py
Signature
class TaxiEnv(Env):
def __init__(
self,
render_mode: str | None = None,
is_rainy: bool = False,
fickle_passenger: bool = False,
)
def encode(self, taxi_row, taxi_col, pass_loc, dest_idx) -> int
def decode(self, i) -> reversed
def action_mask(self, state: int) -> np.ndarray
def step(self, a) -> tuple[int, int, bool, bool, dict]
def reset(self, *, seed: int | None = None, options: dict | None = None) -> tuple[int, dict]
def render(self) -> str | np.ndarray | None
Import
import gymnasium as gym
env = gym.make("Taxi-v3")
# Direct import
from gymnasium.envs.toy_text.taxi import TaxiEnv
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| render_mode | str or None | No | "human", "rgb_array", or "ansi" |
| is_rainy | bool | No | Enable stochastic movement (80/10/10 probability, default False) |
| fickle_passenger | bool | No | Passenger may change destination (default False) |
| a | int (0-5) | Yes (step) | 0=south, 1=north, 2=east, 3=west, 4=pickup, 5=dropoff |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | int | Encoded state (0-499) |
| reward | int | -1 (step), +20 (correct dropoff), -10 (illegal action) |
| terminated | bool | True when passenger is correctly dropped off |
| truncated | bool | Always False (TimeLimit wrapper handles truncation) |
| info | dict | {"prob": float, "action_mask": np.ndarray} with transition probability and valid action mask |
Usage Examples
import gymnasium as gym
env = gym.make("Taxi-v3")
obs, info = env.reset(seed=42)
# Decode state to understand position
taxi_row, taxi_col, pass_loc, dest_idx = env.unwrapped.decode(obs)
print(f"Taxi: ({taxi_row},{taxi_col}), Passenger: {pass_loc}, Dest: {dest_idx}")
# Use action mask for safe exploration
action = env.action_space.sample(info["action_mask"])
obs, reward, terminated, truncated, info = env.step(action)
# Text rendering
env_text = gym.make("Taxi-v3", render_mode="ansi")
obs, info = env_text.reset()
print(env_text.render())
env.close()