Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Huggingface Alignment handbook BitsAndBytes CUDA

From Leeroopedia
Revision as of 18:44, 16 February 2026 by Admin (talk | contribs) (Auto-imported from environments/Huggingface_Alignment_handbook_BitsAndBytes_CUDA.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Infrastructure, Deep_Learning, Optimization
Last Updated 2026-02-07 00:00 GMT

Overview

CUDA-accelerated environment with BitsAndBytes >= 0.46.1 providing 4-bit NF4 quantization for QLoRA training on consumer GPUs.

Description

BitsAndBytes provides the 4-bit quantization backend used by the QLoRA training path in the alignment-handbook. When load_in_4bit: true is set in the recipe config, TRL's get_quantization_config creates a BitsAndBytesConfig that quantizes model weights to 4-bit NF4 format. This enables fine-tuning 7B+ parameter models on a single 24GB consumer GPU (RTX 4090).

The alignment-handbook also uses the BitsAndBytes paged_adamw_32bit optimizer in QLoRA DPO configs, which provides paged memory management to avoid OOM errors.

Usage

Use this environment when running QLoRA fine-tuning workflows. Required by the Get_Model_Quantized implementation. Only needed when load_in_4bit: true is set in the recipe config.

System Requirements

Category Requirement Notes
Hardware NVIDIA GPU with CUDA support BitsAndBytes requires CUDA for quantization kernels
Hardware Minimum 24GB VRAM for 7B models Tested on RTX 4090 (24GB) for QLoRA

Dependencies

Python Packages

  • `bitsandbytes` >= 0.46.1
  • `torch` >= 2.6.0 (peer dependency)
  • `peft` >= 0.16.0 (for LoRA adapter injection on quantized models)

Credentials

No additional credentials required.

Quick Install

# Installed as part of alignment-handbook
uv pip install .

# Or install standalone
pip install bitsandbytes>=0.46.1

Code Evidence

BitsAndBytes version requirement from `setup.py:45`:

    "bitsandbytes>=0.46.1",

Quantization config usage in `src/alignment/model_utils.py:42,49`:

    quantization_config = get_quantization_config(model_args)
    model_kwargs = dict(
        ...
        device_map=get_kbit_device_map() if quantization_config is not None else None,
        quantization_config=quantization_config,
    )

QLoRA config from `recipes/zephyr-7b-beta/sft/config_qlora.yaml:8`:

load_in_4bit: true

Paged optimizer from `recipes/zephyr-7b-beta/dpo/config_qlora.yaml:48`:

optim: paged_adamw_32bit

Common Errors

Error Message Cause Solution
`ImportError: No module named 'bitsandbytes'` BitsAndBytes not installed `pip install bitsandbytes>=0.46.1`
`RuntimeError: CUDA Setup failed` CUDA toolkit not found by bitsandbytes Ensure CUDA toolkit is installed and LD_LIBRARY_PATH includes CUDA libs
`CUDA out of memory` with QLoRA Batch size too high even with quantization Reduce per_device_train_batch_size or increase gradient_accumulation_steps

Compatibility Notes

  • CUDA only: BitsAndBytes quantization requires NVIDIA GPUs. AMD ROCm support may be available in newer versions but is not tested in the alignment-handbook.
  • paged_adamw_32bit: The BitsAndBytes paged optimizer is used specifically for QLoRA DPO training to manage GPU memory.
  • Single GPU vs Multi-GPU: QLoRA with BitsAndBytes is primarily tested on single GPU (DDP config). Multi-GPU QLoRA can use FSDP config.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment