Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Huggingface Peft BitsAndBytes Quantization

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Quantization
Last Updated 2026-02-07 06:44 GMT

Overview

Optional bitsandbytes dependency for 4-bit and 8-bit quantized model training with PEFT adapters (QLoRA workflow).

Description

This environment adds the `bitsandbytes` library as an optional dependency for running quantized model fine-tuning. When available, PEFT registers quantized linear layer variants (`Linear8bitLt` for 8-bit, `Linear4bit` for 4-bit) for LoRA, AdaLoRA, IA3, OFT, VeRA, RandLoRA, and RoAd adapter methods. This enables the QLoRA workflow: loading a pre-trained model in 4-bit quantized format and attaching trainable LoRA adapters on top. The `bitsandbytes` availability is checked via `importlib.util.find_spec("bitsandbytes")` and cached with `@lru_cache`.

Usage

Use this environment when you need to:

  • Fine-tune models using QLoRA (4-bit quantized base + LoRA adapters)
  • Run 8-bit training to reduce VRAM usage
  • Call `prepare_model_for_kbit_training()` on a model loaded with `BitsAndBytesConfig`

System Requirements

Category Requirement Notes
Hardware NVIDIA GPU with CUDA support Required by bitsandbytes
VRAM 8GB+ recommended 4-bit quantization significantly reduces memory
OS Linux (preferred) Windows via WSL2; limited native Windows support

Dependencies

Python Packages

Credentials

No additional credentials required beyond the core environment.

Quick Install

# Install bitsandbytes for quantized training
pip install bitsandbytes

# Full QLoRA setup
pip install peft bitsandbytes transformers accelerate

Code Evidence

Availability check from `src/peft/import_utils.py:24-25`:

@lru_cache
def is_bnb_available() -> bool:
    return importlib.util.find_spec("bitsandbytes") is not None

4-bit availability check from `src/peft/import_utils.py:29-35`:

@lru_cache
def is_bnb_4bit_available() -> bool:
    if not is_bnb_available():
        return False
    import bitsandbytes as bnb
    return hasattr(bnb.nn, "Linear4bit")

Conditional LoRA layer registration from `src/peft/tuners/lora/bnb.py:33`:

if is_bnb_available():
    class Linear8bitLt(torch.nn.Module, LoraLayer):
        # 8-bit quantized LoRA linear layer
        ...

4-bit LoRA layer from `src/peft/tuners/lora/bnb.py:309`:

if is_bnb_4bit_available():
    class Linear4bit(torch.nn.Module, LoraLayer):
        # 4-bit quantized LoRA linear layer
        ...

Quantization detection in `src/peft/utils/other.py:149`:

loaded_in_kbit = getattr(model, "is_loaded_in_8bit", False) or getattr(
    model, "is_loaded_in_4bit", False
)

Common Errors

Error Message Cause Solution
`ImportError: bitsandbytes` bitsandbytes not installed `pip install bitsandbytes`
`CUDA not available` No NVIDIA GPU or CUDA not configured Install CUDA toolkit and NVIDIA drivers
`Linear4bit not found` Old version of bitsandbytes without 4-bit support Upgrade to latest bitsandbytes

Compatibility Notes

  • Adapter methods with bnb support: LoRA, AdaLoRA, IA3, OFT, VeRA, RandLoRA, RoAd all have dedicated `bnb.py` files with quantized layer variants.
  • GPTQ models: Are detected separately via `quantization_method == "gptq"` and do NOT use bitsandbytes layers.
  • fp32 upcasting: For non-GPTQ/AQLM/EETQ quantized models, `prepare_model_for_kbit_training` casts float16/bfloat16 params to float32 (except Params4bit).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment