Environment:Bitsandbytes foundation Bitsandbytes HPU Gaudi Runtime

Knowledge Sources	Bitsandbytes Habana Gaudi
Domains	Infrastructure, HPU_Backend, Dequantization
Last Updated	2026-02-07 14:00 GMT

Overview

Habana Gaudi HPU runtime environment for running bitsandbytes NF4 dequantization on Intel Gaudi accelerators.

Description

This environment provides the HPU (Habana Processing Unit) accelerated context for running bitsandbytes 4-bit NF4 dequantization on Intel Gaudi hardware. It requires the Habana software stack (habana_frameworks) with the habana-torch-plugin package. The backend uses the native torch.ops.hpu.dequantize_nf4 operation provided by the Habana PyTorch integration. Backward compatibility handling exists for Gaudi software versions prior to 1.22, which use a different 4-bit compression format.

Usage

Use this environment for 4-bit NF4 dequantization on Habana Gaudi accelerators. The HPU backend is automatically detected when habana_frameworks is importable and torch.hpu is available. Currently only NF4 quantization type is supported on HPU.

System Requirements

Category	Requirement	Notes
Hardware	Intel Gaudi / Gaudi2 / Gaudi3 accelerator	Habana-designed AI accelerator
OS	Linux (Ubuntu recommended)	Primary supported platform
Gaudi Software	>= 1.21	Version 1.22+ recommended for current compression format
Python	>= 3.10	From pyproject.toml
PyTorch	>= 2.3, < 3	Must have Habana PyTorch plugin

Dependencies

System Packages

Habana software stack (habana_frameworks Python package)
habana-torch-plugin (detected via pip list)
Gaudi driver and runtime

Python Packages

`torch` >= 2.3, < 3
`habana_frameworks`
`habana_frameworks.torch`
`numpy` >= 1.17
`packaging` >= 20.9

Credentials

No secrets or credentials required. The Gaudi software stack is detected via standard Python imports.

Quick Install

# Install with Habana software stack (follow Habana documentation)
# Then install bitsandbytes:
pip install bitsandbytes

# Verify HPU detection
python -c "import torch; print(torch.hpu.is_available())"
python -m bitsandbytes

Code Evidence

Gaudi SW version detection from `bitsandbytes/backends/utils.py`:

def get_gaudi_sw_version():
    output = subprocess.run(
        "pip list | grep habana-torch-plugin",
        shell=True,
        text=True,
        capture_output=True,
    )

Backward compatibility check from `bitsandbytes/backends/hpu/ops.py`:

# Version check for compression format compatibility
if GAUDI_SW_VER.major < 1 or GAUDI_SW_VER.minor < 22:
    # Use reversed compression format for older Gaudi SW

HPU kernel dispatch from `bitsandbytes/backends/hpu/ops.py`:

@register_kernel("bitsandbytes::dequantize_4bit", "hpu")
def _(A, absmax, blocksize, quant_type, shape, dtype):
    # Delegates to torch.ops.hpu.dequantize_nf4

Common Errors

Error Message	Cause	Solution
`habana_frameworks` not found	Habana software stack not installed	Install the full Habana software stack per Habana documentation
`torch.hpu` not available	PyTorch not compiled with Habana support	Install the Habana-compatible PyTorch build
NF4 quantization only	FP4 quant_type passed on HPU	Only NF4 is supported on HPU; use quant_type="nf4"

Compatibility Notes

Quantization types: Only NF4 is supported on HPU. FP4 is not available.
Storage formats: Supports both uint8 and bfloat16 quant_storage.
Gaudi SW < 1.22: Uses reversed 4-bit compression format; handled automatically by the backend.
Gaudi SW >= 1.22: Uses current compression format with direct dequantize_nf4 dispatch.

Related Pages

Implementation:Bitsandbytes_foundation_Bitsandbytes_HPU_Dequantize_4bit

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment