Environment:Huggingface Peft BitsAndBytes Quantization

Knowledge Sources	HuggingFace PEFT bitsandbytes
Domains	Infrastructure, Quantization
Last Updated	2026-02-07 06:44 GMT

Overview

Optional bitsandbytes dependency for 4-bit and 8-bit quantized model training with PEFT adapters (QLoRA workflow).

Description

This environment adds the `bitsandbytes` library as an optional dependency for running quantized model fine-tuning. When available, PEFT registers quantized linear layer variants (`Linear8bitLt` for 8-bit, `Linear4bit` for 4-bit) for LoRA, AdaLoRA, IA3, OFT, VeRA, RandLoRA, and RoAd adapter methods. This enables the QLoRA workflow: loading a pre-trained model in 4-bit quantized format and attaching trainable LoRA adapters on top. The `bitsandbytes` availability is checked via `importlib.util.find_spec("bitsandbytes")` and cached with `@lru_cache`.

Usage

Use this environment when you need to:

Fine-tune models using QLoRA (4-bit quantized base + LoRA adapters)
Run 8-bit training to reduce VRAM usage
Call `prepare_model_for_kbit_training()` on a model loaded with `BitsAndBytesConfig`

System Requirements

Category	Requirement	Notes
Hardware	NVIDIA GPU with CUDA support	Required by bitsandbytes
VRAM	8GB+ recommended	4-bit quantization significantly reduces memory
OS	Linux (preferred)	Windows via WSL2; limited native Windows support

Dependencies

Python Packages

`bitsandbytes` (any version for 8-bit; must have `bnb.nn.Linear4bit` for 4-bit)
All core PEFT dependencies (see Environment:Huggingface_Peft_Python_Core_Dependencies)

Credentials

No additional credentials required beyond the core environment.

Quick Install

# Install bitsandbytes for quantized training
pip install bitsandbytes

# Full QLoRA setup
pip install peft bitsandbytes transformers accelerate

Code Evidence

Availability check from `src/peft/import_utils.py:24-25`:

@lru_cache
def is_bnb_available() -> bool:
    return importlib.util.find_spec("bitsandbytes") is not None

4-bit availability check from `src/peft/import_utils.py:29-35`:

@lru_cache
def is_bnb_4bit_available() -> bool:
    if not is_bnb_available():
        return False
    import bitsandbytes as bnb
    return hasattr(bnb.nn, "Linear4bit")

Conditional LoRA layer registration from `src/peft/tuners/lora/bnb.py:33`:

if is_bnb_available():
    class Linear8bitLt(torch.nn.Module, LoraLayer):
        # 8-bit quantized LoRA linear layer
        ...

4-bit LoRA layer from `src/peft/tuners/lora/bnb.py:309`:

if is_bnb_4bit_available():
    class Linear4bit(torch.nn.Module, LoraLayer):
        # 4-bit quantized LoRA linear layer
        ...

Quantization detection in `src/peft/utils/other.py:149`:

loaded_in_kbit = getattr(model, "is_loaded_in_8bit", False) or getattr(
    model, "is_loaded_in_4bit", False
)

Common Errors

Error Message	Cause	Solution
`ImportError: bitsandbytes`	bitsandbytes not installed	`pip install bitsandbytes`
`CUDA not available`	No NVIDIA GPU or CUDA not configured	Install CUDA toolkit and NVIDIA drivers
`Linear4bit not found`	Old version of bitsandbytes without 4-bit support	Upgrade to latest bitsandbytes

Compatibility Notes

Adapter methods with bnb support: LoRA, AdaLoRA, IA3, OFT, VeRA, RandLoRA, RoAd all have dedicated `bnb.py` files with quantized layer variants.
GPTQ models: Are detected separately via `quantization_method == "gptq"` and do NOT use bitsandbytes layers.
fp32 upcasting: For non-GPTQ/AQLM/EETQ quantized models, `prepare_model_for_kbit_training` casts float16/bfloat16 params to float32 (except Params4bit).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment