Environment:Huggingface Alignment handbook BitsAndBytes CUDA
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning, Optimization |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
CUDA-accelerated environment with BitsAndBytes >= 0.46.1 providing 4-bit NF4 quantization for QLoRA training on consumer GPUs.
Description
BitsAndBytes provides the 4-bit quantization backend used by the QLoRA training path in the alignment-handbook. When load_in_4bit: true is set in the recipe config, TRL's get_quantization_config creates a BitsAndBytesConfig that quantizes model weights to 4-bit NF4 format. This enables fine-tuning 7B+ parameter models on a single 24GB consumer GPU (RTX 4090).
The alignment-handbook also uses the BitsAndBytes paged_adamw_32bit optimizer in QLoRA DPO configs, which provides paged memory management to avoid OOM errors.
Usage
Use this environment when running QLoRA fine-tuning workflows. Required by the Get_Model_Quantized implementation. Only needed when load_in_4bit: true is set in the recipe config.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | NVIDIA GPU with CUDA support | BitsAndBytes requires CUDA for quantization kernels |
| Hardware | Minimum 24GB VRAM for 7B models | Tested on RTX 4090 (24GB) for QLoRA |
Dependencies
Python Packages
- `bitsandbytes` >= 0.46.1
- `torch` >= 2.6.0 (peer dependency)
- `peft` >= 0.16.0 (for LoRA adapter injection on quantized models)
Credentials
No additional credentials required.
Quick Install
# Installed as part of alignment-handbook
uv pip install .
# Or install standalone
pip install bitsandbytes>=0.46.1
Code Evidence
BitsAndBytes version requirement from `setup.py:45`:
"bitsandbytes>=0.46.1",
Quantization config usage in `src/alignment/model_utils.py:42,49`:
quantization_config = get_quantization_config(model_args)
model_kwargs = dict(
...
device_map=get_kbit_device_map() if quantization_config is not None else None,
quantization_config=quantization_config,
)
QLoRA config from `recipes/zephyr-7b-beta/sft/config_qlora.yaml:8`:
load_in_4bit: true
Paged optimizer from `recipes/zephyr-7b-beta/dpo/config_qlora.yaml:48`:
optim: paged_adamw_32bit
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: No module named 'bitsandbytes'` | BitsAndBytes not installed | `pip install bitsandbytes>=0.46.1` |
| `RuntimeError: CUDA Setup failed` | CUDA toolkit not found by bitsandbytes | Ensure CUDA toolkit is installed and LD_LIBRARY_PATH includes CUDA libs |
| `CUDA out of memory` with QLoRA | Batch size too high even with quantization | Reduce per_device_train_batch_size or increase gradient_accumulation_steps |
Compatibility Notes
- CUDA only: BitsAndBytes quantization requires NVIDIA GPUs. AMD ROCm support may be available in newer versions but is not tested in the alignment-handbook.
- paged_adamw_32bit: The BitsAndBytes paged optimizer is used specifically for QLoRA DPO training to manage GPU memory.
- Single GPU vs Multi-GPU: QLoRA with BitsAndBytes is primarily tested on single GPU (DDP config). Multi-GPU QLoRA can use FSDP config.