Implementation:Hiyouga LLaMA Factory FP8 Utils
| Knowledge Sources | |
|---|---|
| Domains | Mixed Precision, Training Optimization |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Provides FP8 mixed-precision training configuration, environment setup, and Accelerator patching for HuggingFace-based training.
Description
The fp8_utils module enables FP8 (8-bit floating point) training support through HuggingFace Accelerate. It supports two backends: TorchAO (default, using rowwise scaling via Float8LinearConfig) and Transformer Engine (optimal for Hopper GPUs, using HYBRID FP8 format). create_fp8_kwargs constructs the appropriate recipe kwargs with a module filter function that skips embedding/head layers and layers with dimensions not divisible by 16 for kernel compatibility. configure_fp8_environment sets environment variables including ACCELERATE_MIXED_PRECISION and optionally FP8_BACKEND and FP8_ENABLE_FSDP_FLOAT8_ALL_GATHER. verify_fp8_status checks that FP8 is actually working after model preparation. patch_accelerator_for_fp8 monkey-patches Accelerator's __init__ to inject FP8 recipe kwargs and force mixed_precision='fp8' when the HuggingFace Trainer does not natively pass kwargs_handlers, which is necessary for Transformer Engine integration.
Usage
Use these utilities when enabling FP8 training for improved throughput on supported hardware (NVIDIA Hopper GPUs). The functions are called during training setup when the fp8 flag is enabled in training arguments.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/train/fp8_utils.py
- Lines: 1-229
Signature
def create_fp8_kwargs(training_args: "TrainingArguments") -> list[Any]
def get_fp8_mixed_precision(training_args: "TrainingArguments") -> Optional[str]
def configure_fp8_environment(training_args: "TrainingArguments") -> None
def verify_fp8_status(accelerator, training_args: "TrainingArguments") -> None
def patch_accelerator_for_fp8() -> None
Import
from llamafactory.train.fp8_utils import (
create_fp8_kwargs,
get_fp8_mixed_precision,
configure_fp8_environment,
verify_fp8_status,
patch_accelerator_for_fp8,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| training_args | TrainingArguments | Yes | Training arguments containing fp8 (bool), fp8_backend ("auto", "torchao", or "te"), and fp8_enable_fsdp_float8_all_gather |
| accelerator | Accelerator | Conditional | Required by verify_fp8_status; the HuggingFace Accelerator instance after model preparation |
Outputs
| Name | Type | Description |
|---|---|---|
| fp8_kwargs | list[Any] | List containing AORecipeKwargs or FP8RecipeKwargs if FP8 is enabled, empty list otherwise (from create_fp8_kwargs) |
| mixed_precision | Optional[str] | "fp8" if FP8 is enabled, None otherwise (from get_fp8_mixed_precision) |
| None | None | configure_fp8_environment, verify_fp8_status, and patch_accelerator_for_fp8 operate via side effects |
Usage Examples
# Setting up FP8 training environment
from llamafactory.train.fp8_utils import configure_fp8_environment, create_fp8_kwargs
configure_fp8_environment(training_args)
fp8_kwargs = create_fp8_kwargs(training_args)
# Pass fp8_kwargs to Accelerator via kwargs_handlers
Related Pages
- Hiyouga_LLaMA_Factory_Quantization - Complementary quantization utilities for model compression
- Hiyouga_LLaMA_Factory_DPO_Workflow - Training workflow that may use FP8 configuration