Environment:Huggingface Peft BitsAndBytes Quantization
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Quantization |
| Last Updated | 2026-02-07 06:44 GMT |
Overview
Optional bitsandbytes dependency for 4-bit and 8-bit quantized model training with PEFT adapters (QLoRA workflow).
Description
This environment adds the `bitsandbytes` library as an optional dependency for running quantized model fine-tuning. When available, PEFT registers quantized linear layer variants (`Linear8bitLt` for 8-bit, `Linear4bit` for 4-bit) for LoRA, AdaLoRA, IA3, OFT, VeRA, RandLoRA, and RoAd adapter methods. This enables the QLoRA workflow: loading a pre-trained model in 4-bit quantized format and attaching trainable LoRA adapters on top. The `bitsandbytes` availability is checked via `importlib.util.find_spec("bitsandbytes")` and cached with `@lru_cache`.
Usage
Use this environment when you need to:
- Fine-tune models using QLoRA (4-bit quantized base + LoRA adapters)
- Run 8-bit training to reduce VRAM usage
- Call `prepare_model_for_kbit_training()` on a model loaded with `BitsAndBytesConfig`
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | NVIDIA GPU with CUDA support | Required by bitsandbytes |
| VRAM | 8GB+ recommended | 4-bit quantization significantly reduces memory |
| OS | Linux (preferred) | Windows via WSL2; limited native Windows support |
Dependencies
Python Packages
- `bitsandbytes` (any version for 8-bit; must have `bnb.nn.Linear4bit` for 4-bit)
- All core PEFT dependencies (see Environment:Huggingface_Peft_Python_Core_Dependencies)
Credentials
No additional credentials required beyond the core environment.
Quick Install
# Install bitsandbytes for quantized training
pip install bitsandbytes
# Full QLoRA setup
pip install peft bitsandbytes transformers accelerate
Code Evidence
Availability check from `src/peft/import_utils.py:24-25`:
@lru_cache
def is_bnb_available() -> bool:
return importlib.util.find_spec("bitsandbytes") is not None
4-bit availability check from `src/peft/import_utils.py:29-35`:
@lru_cache
def is_bnb_4bit_available() -> bool:
if not is_bnb_available():
return False
import bitsandbytes as bnb
return hasattr(bnb.nn, "Linear4bit")
Conditional LoRA layer registration from `src/peft/tuners/lora/bnb.py:33`:
if is_bnb_available():
class Linear8bitLt(torch.nn.Module, LoraLayer):
# 8-bit quantized LoRA linear layer
...
4-bit LoRA layer from `src/peft/tuners/lora/bnb.py:309`:
if is_bnb_4bit_available():
class Linear4bit(torch.nn.Module, LoraLayer):
# 4-bit quantized LoRA linear layer
...
Quantization detection in `src/peft/utils/other.py:149`:
loaded_in_kbit = getattr(model, "is_loaded_in_8bit", False) or getattr(
model, "is_loaded_in_4bit", False
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: bitsandbytes` | bitsandbytes not installed | `pip install bitsandbytes` |
| `CUDA not available` | No NVIDIA GPU or CUDA not configured | Install CUDA toolkit and NVIDIA drivers |
| `Linear4bit not found` | Old version of bitsandbytes without 4-bit support | Upgrade to latest bitsandbytes |
Compatibility Notes
- Adapter methods with bnb support: LoRA, AdaLoRA, IA3, OFT, VeRA, RandLoRA, RoAd all have dedicated `bnb.py` files with quantized layer variants.
- GPTQ models: Are detected separately via `quantization_method == "gptq"` and do NOT use bitsandbytes layers.
- fp32 upcasting: For non-GPTQ/AQLM/EETQ quantized models, `prepare_model_for_kbit_training` casts float16/bfloat16 params to float32 (except Params4bit).