Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium BlackjackFunctional

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Tabular_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

A JAX-accelerated functional implementation of the Blackjack card game environment, registered as tabular/Blackjack-v0, with JIT-compiled state transitions and PyGame-based rendering.

Description

The blackjack module implements Blackjack as a functional environment using the FuncEnv API, designed for JAX acceleration.

Game Mechanics: The game uses an infinite deck (sampling with replacement) from 13 card values where face cards count as 10 and aces can count as 1 or 11 (usable ace). The player receives two initial cards and can hit (action=1) or stick (action=0). On stick, the dealer draws until reaching 17 or more. The game ends when the player busts or sticks.

State Representation (EnvState NamedTuple): Contains dealer_hand (21-element jax array), player_hand (21-element jax array), dealer_cards (count), player_cards (count), and done flag. The 21-element arrays accommodate the maximum possible hand size.

Observation: A 3-element int32 array: [player_sum, dealer_showing_card, has_usable_ace].

Rewards: +1 for winning, -1 for losing, 0 for draw. With natural=True (non-Sutton-Barto mode), a natural blackjack win gives +1.5. In Sutton-Barto mode, a player natural blackjack beats a non-natural dealer 21.

Key JAX Features: The transition function uses jax.lax.cond to branch between hit and stick actions without breaking JIT compilation. The dealer's draw loop uses jax.lax.while_loop for JIT-compatible iteration.

BlackJackParams (flax dataclass) provides configurable natural and sutton_and_barto boolean parameters.

BlackJackJaxEnv is the Gymnasium wrapper class that applies jax.jit to the functional environment and inherits from both FunctionalJaxEnv and EzPickle.

Usage

Use this environment for JAX-accelerated Blackjack simulations and tabular RL algorithm development. Create via gymnasium.make("tabular/Blackjack-v0").

Code Reference

Source Location

Signature

class BlackjackFunctional(
    FuncEnv[EnvState, jax.Array, int, float, bool, RenderStateType, BlackJackParams]
):
    action_space = spaces.Discrete(2)
    observation_space = spaces.Box(low=np.array([1, 1, 0]), high=np.array([32, 11, 1]), shape=(3,), dtype=np.int32)

    def initial(self, rng, params=BlackJackParams) -> EnvState
    def transition(self, state, action, key, params=BlackJackParams) -> EnvState
    def observation(self, state, rng, params=BlackJackParams) -> jax.Array
    def reward(self, state, action, next_state, rng, params=BlackJackParams) -> jax.Array
    def terminal(self, state, rng, params=BlackJackParams) -> jax.Array

class BlackJackJaxEnv(FunctionalJaxEnv, EzPickle):
    def __init__(self, render_mode: str | None = None, **kwargs)

Import

import gymnasium as gym
env = gym.make("tabular/Blackjack-v0")

# Or directly
from gymnasium.envs.tabular.blackjack import BlackjackFunctional, BlackJackJaxEnv

I/O Contract

Inputs

Name Type Required Description
action int (0 or 1) Yes 0 = stick, 1 = hit
render_mode str or None No "rgb_array" for pixel rendering
natural bool No Extra reward for natural blackjack (default False via params)
sutton_and_barto bool No Follow Sutton & Barto rules (default True via params)

Outputs

Name Type Description
observation jax.Array (shape (3,), int32) [player_sum, dealer_card, usable_ace]
reward float +1 (win), -1 (lose), 0 (draw), +1.5 (natural win if enabled)
terminated bool True when hand is complete (bust or stick)
truncated bool Always False
info dict Empty dictionary

Usage Examples

import gymnasium as gym

# Standard usage
env = gym.make("tabular/Blackjack-v0")
obs, info = env.reset(seed=42)
print(f"Player sum: {obs[0]}, Dealer showing: {obs[1]}, Usable ace: {obs[2]}")

# Play a hand
done = False
while not done:
    action = 1 if obs[0] < 17 else 0  # Simple strategy: hit below 17
    obs, reward, done, truncated, info = env.step(action)

print(f"Reward: {reward}")
env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment