Principle:Farama Foundation Gymnasium Tabular Environments

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Tabular_Methods
Last Updated	2026-02-15 03:00 GMT

Overview

Discrete-state environments with finite state and action spaces enable exact tabular reinforcement learning methods such as Q-learning and SARSA.

Description

Tabular environments define Markov Decision Processes with small, discrete state and action spaces that can be represented explicitly as tables or matrices. In these environments, each state is identified by a single integer, and the transition dynamics and reward function can in principle be enumerated completely. This makes them ideally suited for tabular RL methods that maintain value estimates for every state-action pair.

The core tabular environments include grid-world navigation problems such as cliff walking (navigating a gridworld while avoiding a cliff edge), frozen lake (traversing a slippery grid to reach a goal without falling into holes), and taxi (picking up and delivering a passenger in a small grid city). These environments feature stochastic or deterministic transitions, sparse rewards, and well-defined optimal policies that can be computed analytically via dynamic programming. Their simplicity allows learners to focus on understanding fundamental RL concepts without the complications of function approximation or continuous spaces.

In the RL curriculum, tabular environments serve as the entry point for understanding core algorithms. They are the canonical testbeds for value iteration, policy iteration, Q-learning, SARSA, Monte Carlo methods, and temporal difference learning. The ability to inspect and visualize the complete value function and policy makes them invaluable for pedagogy and algorithm debugging.

Usage

Use tabular environments when implementing or teaching fundamental RL algorithms that operate on discrete state-action pairs. They are appropriate for verifying the correctness of Q-learning, SARSA, expected SARSA, and dynamic programming implementations. They also serve as minimal reproducible examples for debugging RL training loops and for studying convergence properties of tabular methods.

Theoretical Basis

Tabular environments are formalized as finite Markov Decision Processes $(S, A, P, R, γ)$ where $S$ is a finite set of states, $A$ is a finite set of actions, $P (s^{'} | s, a)$ is the transition probability function, $R (s, a, s^{'})$ is the reward function, and $γ$ is the discount factor.

The Q-learning update rule for tabular environments:

$Q (s, a) \leftarrow Q (s, a) + α [r + γ \max_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]$

The SARSA update rule:

$Q (s, a) \leftarrow Q (s, a) + α [r + γ Q (s^{'}, a^{'}) - Q (s, a)]$

The Bellman optimality equation that these methods converge to:

$Q^{*} (s, a) = \sum_{s^{'}} P (s^{'} | s, a) [R (s, a, s^{'}) + γ \max_{a^{'}} Q^{*} (s^{'}, a^{'})]$

# Tabular Q-learning on a discrete environment
Q = initialize_table(num_states, num_actions)
for each episode:
    state = env.reset()
    while not done:
        action = epsilon_greedy(Q, state, epsilon)
        next_state, reward, terminated, truncated, info = env.step(action)
        Q[state, action] += alpha * (reward + gamma * max(Q[next_state]) - Q[state, action])
        state = next_state

For stochastic environments like Frozen Lake, the transition function $P (s^{'} | s, a)$ includes slippage probability where the agent may move perpendicular to the intended direction.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment