Principle:Bitsandbytes foundation Bitsandbytes HPU 4bit Dequantization

Knowledge Sources	Bitsandbytes
Domains	HPU_Backend, Dequantization, 4bit_Quantization
Last Updated	2026-02-07 13:31 GMT

Overview

Hardware-accelerated NF4 dequantization on Habana Gaudi processors using native HPU operations with backward-compatible format handling.

Description

Habana Gaudi accelerators provide a native dequantize_nf4 operation that efficiently converts NF4-quantized tensors back to floating-point. This principle addresses the integration of that native operation into the bitsandbytes multi-backend dispatch system, including handling format differences between Gaudi software versions. Older Gaudi SW (pre-1.22) used a different 4-bit nibble ordering (reversed high/low nibbles), requiring a compatibility shim that swaps nibbles before calling the native dequantization.

Usage

Apply this principle when deploying quantized models on Habana Gaudi hardware. It is automatically engaged through the bitsandbytes backend dispatch when an HPU device is detected. Only NF4 quantization is supported on HPU.

Theoretical Basis

The dequantization follows the standard NF4 formula:

Failed to parse (syntax error): {\displaystyle x_i = \text{NF4\_table}[q_i] \times \text{absmax}_{\lfloor i / B \rfloor} }

The Gaudi-specific consideration is nibble ordering in the compressed format:

# Gaudi SW < 1.22 uses reversed nibble order
# Standard: high nibble = first value, low nibble = second value
# Old Gaudi: low nibble = first value, high nibble = second value
def reverse_4bit_compress_format(weight):
    out_1 = (weight & 0xF0) >> 4
    out_2 = (weight & 0xF) << 4
    return out_1 | out_2

Related Pages

Implementation:Bitsandbytes_foundation_Bitsandbytes_HPU_Dequantize_4bit

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment