Environment:Huggingface Alignment handbook PyTorch CUDA

Knowledge Sources	Alignment Handbook PyTorch NVIDIA CUDA
Domains	Infrastructure, Deep_Learning
Last Updated	2026-02-07 00:00 GMT

Overview

Linux environment with CUDA 12.6, Python 3.10+, PyTorch 2.6.0+, and bfloat16 support for GPU-accelerated LLM training.

Description

This environment provides the core GPU-accelerated runtime required by all alignment-handbook training scripts (SFT, DPO, ORPO). It is built around PyTorch >= 2.6.0 with CUDA 12.6 support, running on Python >= 3.10.9. All training recipes use bfloat16 mixed precision and require NVIDIA GPUs with sufficient VRAM for the target model size.

The README specifies installing PyTorch with the CUDA 12.6 index (cu126) and notes that the precise version is important for reproducibility.

Usage

Use this environment for any training or fine-tuning workflow in the alignment-handbook. It is the mandatory prerequisite for running all three training scripts (sft.py, dpo.py, orpo.py) with full-precision model weights.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu recommended)	Tested on Linux; README uses apt-get for git-lfs
Hardware	NVIDIA GPU	Full fine-tuning tested on 8x A100 (80GB); QLoRA tested on RTX 4090 (24GB)
Hardware	Multi-GPU node	Full fine-tuning uses 8 GPUs; QLoRA works on single GPU
Python	>= 3.10.9	Specified in setup.py python_requires
Disk	50GB+ SSD	Model checkpoints and dataset caching require significant storage

Dependencies

System Packages

`git-lfs` (for pushing models to HuggingFace Hub)
CUDA Toolkit 12.6 (matching PyTorch whl index)

Python Packages (Core)

`torch` >= 2.6.0
`transformers` >= 4.53.3
`accelerate` >= 1.9.0
`datasets` >= 4.0.0
`numpy` >= 1.24.2
`safetensors` >= 0.5.3
`packaging` >= 23.0
`tqdm` >= 4.64.1
`jinja2` >= 3.0.0
`scipy`

Python Packages (Optional/Recommended)

`flash-attn` == 2.7.4.post1 (Flash Attention 2, installed separately with --no-build-isolation)
`wandb` (Weights & Biases logging)
`tensorboard` (alternative logging)

Credentials

The following environment variables or login steps are required:

HuggingFace Login: Run `huggingface-cli login` to authenticate for model/dataset downloads and Hub pushes.
`HF_TOKEN`: Alternatively, set this environment variable with your HuggingFace API token.
`WANDB_API_KEY`: Weights & Biases API key (if using `report_to: wandb` in config).

Quick Install

# Create virtual environment
uv venv handbook --python 3.11 && source handbook/bin/activate && uv pip install --upgrade pip

# Install PyTorch with CUDA 12.6
uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126

# Install alignment-handbook and all dependencies
uv pip install .

# Install Flash Attention 2
uv pip install "flash-attn==2.7.4.post1" --no-build-isolation

# Login to HuggingFace Hub
huggingface-cli login

# Install git-lfs for model pushing
sudo apt-get install git-lfs

Code Evidence

Python version requirement from `setup.py:136`:

    python_requires=">=3.10.9",

PyTorch version requirement from `setup.py:67`:

    "torch>=2.6.0",

CUDA/bfloat16 usage evidence from `src/alignment/model_utils.py:39-41`:

    torch_dtype = (
        model_args.torch_dtype if model_args.torch_dtype in ["auto", None] else getattr(torch, model_args.torch_dtype)
    )

All recipe configs specify `bf16: true` and `torch_dtype: bfloat16`, requiring NVIDIA GPUs with bfloat16 support (Ampere or newer).

README installation instructions from `README.md:73-79`:

Next, install PyTorch `v2.6.0`
uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126
Note that the precise version is important for reproducibility!

Common Errors

Error Message	Cause	Solution
`CUDA out of memory`	Model too large for GPU VRAM	Use QLoRA (load_in_4bit: true) or switch to multi-GPU with ZeRO-3
`RuntimeError: FlashAttention only supports Ampere GPUs or newer`	GPU does not support Flash Attention 2	Remove `attn_implementation: flash_attention_2` from config or upgrade GPU
`ImportError: No module named 'flash_attn'`	Flash Attention not installed	Run `uv pip install "flash-attn==2.7.4.post1" --no-build-isolation`
`torch.cuda.is_available() returns False`	CUDA not properly installed	Reinstall PyTorch with correct CUDA version index

Compatibility Notes

bfloat16: All recipes use `bf16: true`. This requires NVIDIA Ampere (A100) or newer GPUs. Older GPUs (V100) support float16 but not bfloat16 natively.
Flash Attention 2: Requires Ampere or newer GPUs. The `--no-build-isolation` flag is required during installation.
Multi-GPU: Full fine-tuning uses DeepSpeed ZeRO-3 (8 GPUs). QLoRA works on a single 24GB GPU.
PyTorch version: The README emphasizes using the exact version (2.6.0) for reproducibility.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment