Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Huggingface Datasets JAX Formatting

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

JAX Formatting is the principle of converting Arrow table data to JAX arrays for use in JAX-based training and inference pipelines.

Description

When a dataset's format is set to "jax", the JAX Formatting principle governs how Arrow columns are converted to jax.Array objects. The conversion extracts NumPy arrays from Arrow, applies dtype defaults (int32 or int64 for integers depending on jax_enable_x64 config, float32 for floats), and calls jnp.array() to produce the result. A device string parameter controls which JAX device (CPU, GPU, TPU) the arrays are placed on. The formatter uses a global device mapping because jaxlib.xla_extension.Device objects are not serializable with pickle or dill. Lists of same-shaped arrays are consolidated via jnp.stack.

Usage

Use JAX Formatting when you are training or evaluating models with JAX/Flax/Optax and want the dataset's __getitem__ to return ready-to-use JAX arrays on the appropriate device. This eliminates manual conversion and device placement code.

Theoretical Basis

JAX arrays are immutable, device-resident tensors that serve as the fundamental data structure in the JAX ecosystem. Converting from Arrow to JAX follows the NumPy extraction pattern, then wraps the result with jnp.array() inside a jax.default_device context manager to control device placement. The default integer precision depends on JAX's x64 mode setting: when x64 is disabled (the default), integers are downcast to int32 to match JAX's default behavior and avoid silent precision issues.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment