Environment:LLMBook zh LLMBook zh github io PyTorch CUDA GPU Environment

Knowledge Sources	LLMBook-zh PyTorch NVIDIA CUDA
Domains	Infrastructure, Deep_Learning
Last Updated	2026-02-08 04:30 GMT

Overview

Linux environment with NVIDIA CUDA GPU support, PyTorch with CUDA backend, and NumPy for tensor computation across all LLM training, inference, and architecture code.

Description

This environment provides the foundational GPU-accelerated compute layer for all deep learning operations in the LLMBook codebase. PyTorch serves as the core tensor computation framework, used across 13+ source files for model architecture definitions (RMSNorm, RoPE, ALiBi, MoE, LLaMA), training loops (pre-training, SFT, LoRA, DPO), and inference/quantization workflows. CUDA GPU access is required for training scripts that use device_map="auto", mixed-precision BF16 training, and GPU memory monitoring via torch.cuda.memory_allocated().

Usage

Use this environment for all training, fine-tuning, and inference workflows in the LLMBook codebase. It is a mandatory prerequisite for every Implementation that imports torch or uses nn.Module subclasses. The GPU requirement is especially critical for pre-training (Ch. 6), SFT (Ch. 7), DPO alignment (Ch. 8), and quantization/inference (Ch. 9).

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu recommended)	CUDA toolkit requires Linux for full support
Hardware	NVIDIA GPU with CUDA support	Minimum 8GB VRAM for quantized models; 16GB+ for full-precision training of 7B models
Hardware	BF16-capable GPU	Ampere (A100) or newer for native BF16; RTX 30/40 series also supported
Disk	50GB+ SSD	For model weights (7B model ~14GB in fp16) and dataset caching

Dependencies

System Packages

`nvidia-driver` >= 525
`cuda-toolkit` >= 11.7

Python Packages

`torch` >= 1.13 (with CUDA support)
`numpy` >= 1.21

Credentials

No credentials required for this base environment.

Quick Install

# Install PyTorch with CUDA support
pip install torch numpy

# Verify CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Code Evidence

GPU memory monitoring from `code/9.3 bitsandbytes实践.py:7`:

print(f"memory usage: {torch.cuda.memory_allocated()/1000/1000/1000} GB")

Automatic device mapping from `code/9.3 bitsandbytes实践.py:6`:

model_8bit = AutoModelForCausalLM.from_pretrained(name, device_map="auto", load_in_8bit=True)

BF16 mixed precision requirement from `code/6.2 预训练实践.py:37-38`:

bf16: bool = HfArg(
    default=True,
    help="Whether to use bf16 (mixed) precision instead of 32-bit.",
)

PyTorch tensor operations used extensively across architecture files, e.g. `code/5.1 RMSNorm.py`, `code/5.2 RoPE.py`, `code/5.4 MoE.py`.

Common Errors

Error Message	Cause	Solution
`RuntimeError: CUDA out of memory`	Insufficient GPU VRAM for model size	Use quantization (4-bit/8-bit) or reduce batch size
`RuntimeError: No CUDA GPUs are available`	No NVIDIA GPU detected	Verify `nvidia-smi` output; install NVIDIA drivers
`RuntimeError: expected scalar type BFloat16`	GPU does not support BF16	Use `fp16=True` instead of `bf16=True` on pre-Ampere GPUs

Compatibility Notes

Pre-Ampere GPUs (V100, RTX 2080): BF16 not natively supported; use FP16 mixed precision instead.
Multi-GPU: DeepSpeed integration available in LoRA training script (`code/7.4 LoRA实践.py`) for distributed training.
CPU-only: Data preprocessing scripts (Ch. 4: quality filtering, deduplication, BPE) do not require GPU.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment