Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:LLMBook zh LLMBook zh github io PyTorch CUDA GPU Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deep_Learning
Last Updated 2026-02-08 04:30 GMT

Overview

Linux environment with NVIDIA CUDA GPU support, PyTorch with CUDA backend, and NumPy for tensor computation across all LLM training, inference, and architecture code.

Description

This environment provides the foundational GPU-accelerated compute layer for all deep learning operations in the LLMBook codebase. PyTorch serves as the core tensor computation framework, used across 13+ source files for model architecture definitions (RMSNorm, RoPE, ALiBi, MoE, LLaMA), training loops (pre-training, SFT, LoRA, DPO), and inference/quantization workflows. CUDA GPU access is required for training scripts that use device_map="auto", mixed-precision BF16 training, and GPU memory monitoring via torch.cuda.memory_allocated().

Usage

Use this environment for all training, fine-tuning, and inference workflows in the LLMBook codebase. It is a mandatory prerequisite for every Implementation that imports torch or uses nn.Module subclasses. The GPU requirement is especially critical for pre-training (Ch. 6), SFT (Ch. 7), DPO alignment (Ch. 8), and quantization/inference (Ch. 9).

System Requirements

Category Requirement Notes
OS Linux (Ubuntu recommended) CUDA toolkit requires Linux for full support
Hardware NVIDIA GPU with CUDA support Minimum 8GB VRAM for quantized models; 16GB+ for full-precision training of 7B models
Hardware BF16-capable GPU Ampere (A100) or newer for native BF16; RTX 30/40 series also supported
Disk 50GB+ SSD For model weights (7B model ~14GB in fp16) and dataset caching

Dependencies

System Packages

  • `nvidia-driver` >= 525
  • `cuda-toolkit` >= 11.7

Python Packages

  • `torch` >= 1.13 (with CUDA support)
  • `numpy` >= 1.21

Credentials

No credentials required for this base environment.

Quick Install

# Install PyTorch with CUDA support
pip install torch numpy

# Verify CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Code Evidence

GPU memory monitoring from `code/9.3 bitsandbytes实践.py:7`:

print(f"memory usage: {torch.cuda.memory_allocated()/1000/1000/1000} GB")

Automatic device mapping from `code/9.3 bitsandbytes实践.py:6`:

model_8bit = AutoModelForCausalLM.from_pretrained(name, device_map="auto", load_in_8bit=True)

BF16 mixed precision requirement from `code/6.2 预训练实践.py:37-38`:

bf16: bool = HfArg(
    default=True,
    help="Whether to use bf16 (mixed) precision instead of 32-bit.",
)

PyTorch tensor operations used extensively across architecture files, e.g. `code/5.1 RMSNorm.py`, `code/5.2 RoPE.py`, `code/5.4 MoE.py`.

Common Errors

Error Message Cause Solution
`RuntimeError: CUDA out of memory` Insufficient GPU VRAM for model size Use quantization (4-bit/8-bit) or reduce batch size
`RuntimeError: No CUDA GPUs are available` No NVIDIA GPU detected Verify `nvidia-smi` output; install NVIDIA drivers
`RuntimeError: expected scalar type BFloat16` GPU does not support BF16 Use `fp16=True` instead of `bf16=True` on pre-Ampere GPUs

Compatibility Notes

  • Pre-Ampere GPUs (V100, RTX 2080): BF16 not natively supported; use FP16 mixed precision instead.
  • Multi-GPU: DeepSpeed integration available in LoRA training script (`code/7.4 LoRA实践.py`) for distributed training.
  • CPU-only: Data preprocessing scripts (Ch. 4: quality filtering, deduplication, BPE) do not require GPU.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment