Heuristic:LLMBook zh LLMBook zh github io BF16 Mixed Precision Default

Knowledge Sources	LLMBook-zh NVIDIA Mixed Precision Training
Domains	LLMs, Optimization, Training
Last Updated	2026-02-08 04:30 GMT

Overview

Default to BF16 (bfloat16) mixed precision training for all LLM training and fine-tuning workflows.

Description

All training scripts in the codebase (pre-training, SFT, LoRA, DPO) set bf16=True as the default precision. BFloat16 uses 8 exponent bits and 7 mantissa bits, providing the same dynamic range as FP32 but with reduced precision. This avoids the overflow/underflow issues common with FP16, making it the preferred choice for LLM training on modern GPUs.

Usage

Use BF16 as the default precision for any training workflow. Switch to FP16 only if running on pre-Ampere GPUs (V100, RTX 2080) that lack native BF16 support. Use FP32 only when debugging numerical precision issues.

The Insight (Rule of Thumb)

Action: Set `bf16=True` in `TrainingArguments`.
Value: Enabled by default across pre-training, SFT, and DPO scripts.
Trade-off: ~50% memory reduction vs FP32 with negligible accuracy loss. BF16 avoids the gradient scaling complexity required by FP16.
Hardware: Requires Ampere (A100) or newer NVIDIA GPU for native support.

Reasoning

BF16 maintains the full dynamic range of FP32 (8 exponent bits), unlike FP16 which has only 5 exponent bits and frequently causes overflow in LLM training (especially in attention logits and loss values). The reduced mantissa precision (7 bits vs 23 in FP32) causes minimal accuracy loss for LLM training. Google Brain developed BF16 specifically for deep learning, and it has become the standard for LLM training since the introduction of Ampere GPUs.

Code Evidence:

Pre-training default from `code/6.2 预训练实践.py:37-39`:

# 使用BF16混合精度训练
bf16: bool = HfArg(
    default=True,
    help="Whether to use bf16 (mixed) precision instead of 32-bit.",
)

SFT default from `code/7.1 SFT实践.py:41-44`:

# 使用BF16混合精度训练
bf16: bool = HfArg(
    default=True,
    help="Whether to use bf16 (mixed) precision instead of 32-bit.",
)

DPO default from `code/8.2 DPO实践.py:23-26`:

# 使用 BF16 混合精度训练
bf16: bool = HfArg(
    default=True,
    help="Whether to use bf16 (mixed) precision instead of 32-bit.",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment