Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Bitsandbytes foundation Bitsandbytes FSDP 4bit Quantization

From Leeroopedia


Metadata

Field Value
Sources Paper: QLoRA, Blog: FSDP QLoRA, Repo: bitsandbytes
Domains Quantization, Distributed_Training
Last updated 2026-02-07 14:00 GMT

Overview

A specialized 4-bit quantization approach that stores quantized weights in a float dtype (e.g., bfloat16) to enable compatibility with FSDP parameter sharding.

Description

Standard 4-bit quantization stores packed weights in uint8, but FSDP requires all parameters to share a uniform dtype for sharding and all-gather operations. The solution is quant_storage=torch.bfloat16: pack 4-bit values into bfloat16 tensors instead of uint8. This allows FSDP to treat quantized weights as regular bfloat16 parameters for sharding, while the actual data remains 4-bit quantized.

The torch_dtype parameter must match quant_storage for proper FSDP operation.

A critical helper function fix_4bit_weight_quant_state_from_module recovers quantization state (QuantState) that may be lost during FSDP shard/unshard operations. This function is called at the start of every Linear4bit.forward() to ensure the weight tensor always has its quantization metadata available.

Usage

Required for distributed fine-tuning of large models (e.g., 70B parameters) across multiple GPUs using FSDP. Enables training models that would not fit on a single GPU even with quantization.

Typical configuration:

from transformers import BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_storage=torch.bfloat16,  # Key for FSDP compatibility
)

Theoretical Basis

FSDP shards model parameters across data-parallel ranks. Each rank holds 1/N of each parameter. For this to work, all parameters must be in a uniform dtype that supports sharding (gather/scatter operations).

By storing 4-bit data in bfloat16 containers, we satisfy FSDP's dtype requirement while maintaining 4-bit compression. The quant_storage dtype is set on both the Linear4bit module (for state recovery) and the Params4bit weight (for actual storage).

The key insight is that the container dtype (bfloat16) is separate from the data representation (4-bit quantized values). FSDP operates on the container dtype for sharding and communication, while the quantized data inside those containers is preserved transparently.

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment