Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Bitsandbytes foundation Bitsandbytes HPU 4bit Dequantization

From Leeroopedia


Knowledge Sources
Domains HPU_Backend, Dequantization, 4bit_Quantization
Last Updated 2026-02-07 13:31 GMT

Overview

Hardware-accelerated NF4 dequantization on Habana Gaudi processors using native HPU operations with backward-compatible format handling.

Description

Habana Gaudi accelerators provide a native dequantize_nf4 operation that efficiently converts NF4-quantized tensors back to floating-point. This principle addresses the integration of that native operation into the bitsandbytes multi-backend dispatch system, including handling format differences between Gaudi software versions. Older Gaudi SW (pre-1.22) used a different 4-bit nibble ordering (reversed high/low nibbles), requiring a compatibility shim that swaps nibbles before calling the native dequantization.

Usage

Apply this principle when deploying quantized models on Habana Gaudi hardware. It is automatically engaged through the bitsandbytes backend dispatch when an HPU device is detected. Only NF4 quantization is supported on HPU.

Theoretical Basis

The dequantization follows the standard NF4 formula:

Failed to parse (syntax error): {\displaystyle x_i = \text{NF4\_table}[q_i] \times \text{absmax}_{\lfloor i / B \rfloor} }

The Gaudi-specific consideration is nibble ordering in the compressed format:

# Gaudi SW < 1.22 uses reversed nibble order
# Standard: high nibble = first value, low nibble = second value
# Old Gaudi: low nibble = first value, high nibble = second value
def reverse_4bit_compress_format(weight):
    out_1 = (weight & 0xF0) >> 4
    out_2 = (weight & 0xF) << 4
    return out_1 | out_2

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment