Implementation:Farama Foundation Gymnasium BlackjackFunctional
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Tabular_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
A JAX-accelerated functional implementation of the Blackjack card game environment, registered as tabular/Blackjack-v0, with JIT-compiled state transitions and PyGame-based rendering.
Description
The blackjack module implements Blackjack as a functional environment using the FuncEnv API, designed for JAX acceleration.
Game Mechanics: The game uses an infinite deck (sampling with replacement) from 13 card values where face cards count as 10 and aces can count as 1 or 11 (usable ace). The player receives two initial cards and can hit (action=1) or stick (action=0). On stick, the dealer draws until reaching 17 or more. The game ends when the player busts or sticks.
State Representation (EnvState NamedTuple): Contains dealer_hand (21-element jax array), player_hand (21-element jax array), dealer_cards (count), player_cards (count), and done flag. The 21-element arrays accommodate the maximum possible hand size.
Observation: A 3-element int32 array: [player_sum, dealer_showing_card, has_usable_ace].
Rewards: +1 for winning, -1 for losing, 0 for draw. With natural=True (non-Sutton-Barto mode), a natural blackjack win gives +1.5. In Sutton-Barto mode, a player natural blackjack beats a non-natural dealer 21.
Key JAX Features: The transition function uses jax.lax.cond to branch between hit and stick actions without breaking JIT compilation. The dealer's draw loop uses jax.lax.while_loop for JIT-compatible iteration.
BlackJackParams (flax dataclass) provides configurable natural and sutton_and_barto boolean parameters.
BlackJackJaxEnv is the Gymnasium wrapper class that applies jax.jit to the functional environment and inherits from both FunctionalJaxEnv and EzPickle.
Usage
Use this environment for JAX-accelerated Blackjack simulations and tabular RL algorithm development. Create via gymnasium.make("tabular/Blackjack-v0").
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File:
gymnasium/envs/tabular/blackjack.py
Signature
class BlackjackFunctional(
FuncEnv[EnvState, jax.Array, int, float, bool, RenderStateType, BlackJackParams]
):
action_space = spaces.Discrete(2)
observation_space = spaces.Box(low=np.array([1, 1, 0]), high=np.array([32, 11, 1]), shape=(3,), dtype=np.int32)
def initial(self, rng, params=BlackJackParams) -> EnvState
def transition(self, state, action, key, params=BlackJackParams) -> EnvState
def observation(self, state, rng, params=BlackJackParams) -> jax.Array
def reward(self, state, action, next_state, rng, params=BlackJackParams) -> jax.Array
def terminal(self, state, rng, params=BlackJackParams) -> jax.Array
class BlackJackJaxEnv(FunctionalJaxEnv, EzPickle):
def __init__(self, render_mode: str | None = None, **kwargs)
Import
import gymnasium as gym
env = gym.make("tabular/Blackjack-v0")
# Or directly
from gymnasium.envs.tabular.blackjack import BlackjackFunctional, BlackJackJaxEnv
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | int (0 or 1) | Yes | 0 = stick, 1 = hit |
| render_mode | str or None | No | "rgb_array" for pixel rendering |
| natural | bool | No | Extra reward for natural blackjack (default False via params) |
| sutton_and_barto | bool | No | Follow Sutton & Barto rules (default True via params) |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | jax.Array (shape (3,), int32) | [player_sum, dealer_card, usable_ace] |
| reward | float | +1 (win), -1 (lose), 0 (draw), +1.5 (natural win if enabled) |
| terminated | bool | True when hand is complete (bust or stick) |
| truncated | bool | Always False |
| info | dict | Empty dictionary |
Usage Examples
import gymnasium as gym
# Standard usage
env = gym.make("tabular/Blackjack-v0")
obs, info = env.reset(seed=42)
print(f"Player sum: {obs[0]}, Dealer showing: {obs[1]}, Usable ace: {obs[2]}")
# Play a hand
done = False
while not done:
action = 1 if obs[0] < 17 else 0 # Simple strategy: hit below 17
obs, reward, done, truncated, info = env.step(action)
print(f"Reward: {reward}")
env.close()