Principle:Bitsandbytes foundation Bitsandbytes HPU 4bit Dequantization
| Knowledge Sources | |
|---|---|
| Domains | HPU_Backend, Dequantization, 4bit_Quantization |
| Last Updated | 2026-02-07 13:31 GMT |
Overview
Hardware-accelerated NF4 dequantization on Habana Gaudi processors using native HPU operations with backward-compatible format handling.
Description
Habana Gaudi accelerators provide a native dequantize_nf4 operation that efficiently converts NF4-quantized tensors back to floating-point. This principle addresses the integration of that native operation into the bitsandbytes multi-backend dispatch system, including handling format differences between Gaudi software versions. Older Gaudi SW (pre-1.22) used a different 4-bit nibble ordering (reversed high/low nibbles), requiring a compatibility shim that swaps nibbles before calling the native dequantization.
Usage
Apply this principle when deploying quantized models on Habana Gaudi hardware. It is automatically engaged through the bitsandbytes backend dispatch when an HPU device is detected. Only NF4 quantization is supported on HPU.
Theoretical Basis
The dequantization follows the standard NF4 formula:
Failed to parse (syntax error): {\displaystyle x_i = \text{NF4\_table}[q_i] \times \text{absmax}_{\lfloor i / B \rfloor} }
The Gaudi-specific consideration is nibble ordering in the compressed format:
# Gaudi SW < 1.22 uses reversed nibble order
# Standard: high nibble = first value, low nibble = second value
# Old Gaudi: low nibble = first value, high nibble = second value
def reverse_4bit_compress_format(weight):
out_1 = (weight & 0xF0) >> 4
out_2 = (weight & 0xF) << 4
return out_1 | out_2