Environment:Bitsandbytes foundation Bitsandbytes HPU Gaudi Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, HPU_Backend, Dequantization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Habana Gaudi HPU runtime environment for running bitsandbytes NF4 dequantization on Intel Gaudi accelerators.
Description
This environment provides the HPU (Habana Processing Unit) accelerated context for running bitsandbytes 4-bit NF4 dequantization on Intel Gaudi hardware. It requires the Habana software stack (habana_frameworks) with the habana-torch-plugin package. The backend uses the native torch.ops.hpu.dequantize_nf4 operation provided by the Habana PyTorch integration. Backward compatibility handling exists for Gaudi software versions prior to 1.22, which use a different 4-bit compression format.
Usage
Use this environment for 4-bit NF4 dequantization on Habana Gaudi accelerators. The HPU backend is automatically detected when habana_frameworks is importable and torch.hpu is available. Currently only NF4 quantization type is supported on HPU.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | Intel Gaudi / Gaudi2 / Gaudi3 accelerator | Habana-designed AI accelerator |
| OS | Linux (Ubuntu recommended) | Primary supported platform |
| Gaudi Software | >= 1.21 | Version 1.22+ recommended for current compression format |
| Python | >= 3.10 | From pyproject.toml |
| PyTorch | >= 2.3, < 3 | Must have Habana PyTorch plugin |
Dependencies
System Packages
- Habana software stack (habana_frameworks Python package)
- habana-torch-plugin (detected via pip list)
- Gaudi driver and runtime
Python Packages
- `torch` >= 2.3, < 3
- `habana_frameworks`
- `habana_frameworks.torch`
- `numpy` >= 1.17
- `packaging` >= 20.9
Credentials
No secrets or credentials required. The Gaudi software stack is detected via standard Python imports.
Quick Install
# Install with Habana software stack (follow Habana documentation)
# Then install bitsandbytes:
pip install bitsandbytes
# Verify HPU detection
python -c "import torch; print(torch.hpu.is_available())"
python -m bitsandbytes
Code Evidence
Gaudi SW version detection from `bitsandbytes/backends/utils.py`:
def get_gaudi_sw_version():
output = subprocess.run(
"pip list | grep habana-torch-plugin",
shell=True,
text=True,
capture_output=True,
)
Backward compatibility check from `bitsandbytes/backends/hpu/ops.py`:
# Version check for compression format compatibility
if GAUDI_SW_VER.major < 1 or GAUDI_SW_VER.minor < 22:
# Use reversed compression format for older Gaudi SW
HPU kernel dispatch from `bitsandbytes/backends/hpu/ops.py`:
@register_kernel("bitsandbytes::dequantize_4bit", "hpu")
def _(A, absmax, blocksize, quant_type, shape, dtype):
# Delegates to torch.ops.hpu.dequantize_nf4
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `habana_frameworks` not found | Habana software stack not installed | Install the full Habana software stack per Habana documentation |
| `torch.hpu` not available | PyTorch not compiled with Habana support | Install the Habana-compatible PyTorch build |
| NF4 quantization only | FP4 quant_type passed on HPU | Only NF4 is supported on HPU; use quant_type="nf4" |
Compatibility Notes
- Quantization types: Only NF4 is supported on HPU. FP4 is not available.
- Storage formats: Supports both uint8 and bfloat16 quant_storage.
- Gaudi SW < 1.22: Uses reversed 4-bit compression format; handled automatically by the backend.
- Gaudi SW >= 1.22: Uses current compression format with direct dequantize_nf4 dispatch.