Environment:Huggingface Alignment handbook BitsAndBytes CUDA

Knowledge Sources	Alignment Handbook BitsAndBytes QLoRA
Domains	Infrastructure, Deep_Learning, Optimization
Last Updated	2026-02-07 00:00 GMT

Overview

CUDA-accelerated environment with BitsAndBytes >= 0.46.1 providing 4-bit NF4 quantization for QLoRA training on consumer GPUs.

Description

BitsAndBytes provides the 4-bit quantization backend used by the QLoRA training path in the alignment-handbook. When load_in_4bit: true is set in the recipe config, TRL's get_quantization_config creates a BitsAndBytesConfig that quantizes model weights to 4-bit NF4 format. This enables fine-tuning 7B+ parameter models on a single 24GB consumer GPU (RTX 4090).

The alignment-handbook also uses the BitsAndBytes paged_adamw_32bit optimizer in QLoRA DPO configs, which provides paged memory management to avoid OOM errors.

Usage

Use this environment when running QLoRA fine-tuning workflows. Required by the Get_Model_Quantized implementation. Only needed when load_in_4bit: true is set in the recipe config.

System Requirements

Category	Requirement	Notes
Hardware	NVIDIA GPU with CUDA support	BitsAndBytes requires CUDA for quantization kernels
Hardware	Minimum 24GB VRAM for 7B models	Tested on RTX 4090 (24GB) for QLoRA

Dependencies

Python Packages

`bitsandbytes` >= 0.46.1
`torch` >= 2.6.0 (peer dependency)
`peft` >= 0.16.0 (for LoRA adapter injection on quantized models)

Credentials

No additional credentials required.

Quick Install

# Installed as part of alignment-handbook
uv pip install .

# Or install standalone
pip install bitsandbytes>=0.46.1

Code Evidence

BitsAndBytes version requirement from `setup.py:45`:

    "bitsandbytes>=0.46.1",

Quantization config usage in `src/alignment/model_utils.py:42,49`:

    quantization_config = get_quantization_config(model_args)
    model_kwargs = dict(
        ...
        device_map=get_kbit_device_map() if quantization_config is not None else None,
        quantization_config=quantization_config,
    )

QLoRA config from `recipes/zephyr-7b-beta/sft/config_qlora.yaml:8`:

load_in_4bit: true

Paged optimizer from `recipes/zephyr-7b-beta/dpo/config_qlora.yaml:48`:

optim: paged_adamw_32bit

Common Errors

Error Message	Cause	Solution
`ImportError: No module named 'bitsandbytes'`	BitsAndBytes not installed	`pip install bitsandbytes>=0.46.1`
`RuntimeError: CUDA Setup failed`	CUDA toolkit not found by bitsandbytes	Ensure CUDA toolkit is installed and LD_LIBRARY_PATH includes CUDA libs
`CUDA out of memory` with QLoRA	Batch size too high even with quantization	Reduce per_device_train_batch_size or increase gradient_accumulation_steps

Compatibility Notes

CUDA only: BitsAndBytes quantization requires NVIDIA GPUs. AMD ROCm support may be available in newer versions but is not tested in the alignment-handbook.
paged_adamw_32bit: The BitsAndBytes paged optimizer is used specifically for QLoRA DPO training to manage GPU memory.
Single GPU vs Multi-GPU: QLoRA with BitsAndBytes is primarily tested on single GPU (DDP config). Multi-GPU QLoRA can use FSDP config.

Related Pages

Implementation:Huggingface_Alignment_handbook_Get_Model_Quantized

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment