Environment:Huggingface Alignment handbook PyTorch CUDA
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Linux environment with CUDA 12.6, Python 3.10+, PyTorch 2.6.0+, and bfloat16 support for GPU-accelerated LLM training.
Description
This environment provides the core GPU-accelerated runtime required by all alignment-handbook training scripts (SFT, DPO, ORPO). It is built around PyTorch >= 2.6.0 with CUDA 12.6 support, running on Python >= 3.10.9. All training recipes use bfloat16 mixed precision and require NVIDIA GPUs with sufficient VRAM for the target model size.
The README specifies installing PyTorch with the CUDA 12.6 index (cu126) and notes that the precise version is important for reproducibility.
Usage
Use this environment for any training or fine-tuning workflow in the alignment-handbook. It is the mandatory prerequisite for running all three training scripts (sft.py, dpo.py, orpo.py) with full-precision model weights.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended) | Tested on Linux; README uses apt-get for git-lfs |
| Hardware | NVIDIA GPU | Full fine-tuning tested on 8x A100 (80GB); QLoRA tested on RTX 4090 (24GB) |
| Hardware | Multi-GPU node | Full fine-tuning uses 8 GPUs; QLoRA works on single GPU |
| Python | >= 3.10.9 | Specified in setup.py python_requires |
| Disk | 50GB+ SSD | Model checkpoints and dataset caching require significant storage |
Dependencies
System Packages
- `git-lfs` (for pushing models to HuggingFace Hub)
- CUDA Toolkit 12.6 (matching PyTorch whl index)
Python Packages (Core)
- `torch` >= 2.6.0
- `transformers` >= 4.53.3
- `accelerate` >= 1.9.0
- `datasets` >= 4.0.0
- `numpy` >= 1.24.2
- `safetensors` >= 0.5.3
- `packaging` >= 23.0
- `tqdm` >= 4.64.1
- `jinja2` >= 3.0.0
- `scipy`
Python Packages (Optional/Recommended)
- `flash-attn` == 2.7.4.post1 (Flash Attention 2, installed separately with --no-build-isolation)
- `wandb` (Weights & Biases logging)
- `tensorboard` (alternative logging)
Credentials
The following environment variables or login steps are required:
- HuggingFace Login: Run `huggingface-cli login` to authenticate for model/dataset downloads and Hub pushes.
- `HF_TOKEN`: Alternatively, set this environment variable with your HuggingFace API token.
- `WANDB_API_KEY`: Weights & Biases API key (if using `report_to: wandb` in config).
Quick Install
# Create virtual environment
uv venv handbook --python 3.11 && source handbook/bin/activate && uv pip install --upgrade pip
# Install PyTorch with CUDA 12.6
uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126
# Install alignment-handbook and all dependencies
uv pip install .
# Install Flash Attention 2
uv pip install "flash-attn==2.7.4.post1" --no-build-isolation
# Login to HuggingFace Hub
huggingface-cli login
# Install git-lfs for model pushing
sudo apt-get install git-lfs
Code Evidence
Python version requirement from `setup.py:136`:
python_requires=">=3.10.9",
PyTorch version requirement from `setup.py:67`:
"torch>=2.6.0",
CUDA/bfloat16 usage evidence from `src/alignment/model_utils.py:39-41`:
torch_dtype = (
model_args.torch_dtype if model_args.torch_dtype in ["auto", None] else getattr(torch, model_args.torch_dtype)
)
All recipe configs specify `bf16: true` and `torch_dtype: bfloat16`, requiring NVIDIA GPUs with bfloat16 support (Ampere or newer).
README installation instructions from `README.md:73-79`:
Next, install PyTorch `v2.6.0`
uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126
Note that the precise version is important for reproducibility!
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `CUDA out of memory` | Model too large for GPU VRAM | Use QLoRA (load_in_4bit: true) or switch to multi-GPU with ZeRO-3 |
| `RuntimeError: FlashAttention only supports Ampere GPUs or newer` | GPU does not support Flash Attention 2 | Remove `attn_implementation: flash_attention_2` from config or upgrade GPU |
| `ImportError: No module named 'flash_attn'` | Flash Attention not installed | Run `uv pip install "flash-attn==2.7.4.post1" --no-build-isolation` |
| `torch.cuda.is_available() returns False` | CUDA not properly installed | Reinstall PyTorch with correct CUDA version index |
Compatibility Notes
- bfloat16: All recipes use `bf16: true`. This requires NVIDIA Ampere (A100) or newer GPUs. Older GPUs (V100) support float16 but not bfloat16 natively.
- Flash Attention 2: Requires Ampere or newer GPUs. The `--no-build-isolation` flag is required during installation.
- Multi-GPU: Full fine-tuning uses DeepSpeed ZeRO-3 (8 GPUs). QLoRA works on a single 24GB GPU.
- PyTorch version: The README emphasizes using the exact version (2.6.0) for reproducibility.