Implementation:Hiyouga LLaMA Factory FP8 Utils

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Mixed Precision, Training Optimization
Last Updated	2026-02-06 19:00 GMT

Overview

Provides FP8 mixed-precision training configuration, environment setup, and Accelerator patching for HuggingFace-based training.

Description

The fp8_utils module enables FP8 (8-bit floating point) training support through HuggingFace Accelerate. It supports two backends: TorchAO (default, using rowwise scaling via Float8LinearConfig) and Transformer Engine (optimal for Hopper GPUs, using HYBRID FP8 format). create_fp8_kwargs constructs the appropriate recipe kwargs with a module filter function that skips embedding/head layers and layers with dimensions not divisible by 16 for kernel compatibility. configure_fp8_environment sets environment variables including ACCELERATE_MIXED_PRECISION and optionally FP8_BACKEND and FP8_ENABLE_FSDP_FLOAT8_ALL_GATHER. verify_fp8_status checks that FP8 is actually working after model preparation. patch_accelerator_for_fp8 monkey-patches Accelerator's __init__ to inject FP8 recipe kwargs and force mixed_precision='fp8' when the HuggingFace Trainer does not natively pass kwargs_handlers, which is necessary for Transformer Engine integration.

Usage

Use these utilities when enabling FP8 training for improved throughput on supported hardware (NVIDIA Hopper GPUs). The functions are called during training setup when the fp8 flag is enabled in training arguments.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/train/fp8_utils.py
Lines: 1-229

Signature

def create_fp8_kwargs(training_args: "TrainingArguments") -> list[Any]

def get_fp8_mixed_precision(training_args: "TrainingArguments") -> Optional[str]

def configure_fp8_environment(training_args: "TrainingArguments") -> None

def verify_fp8_status(accelerator, training_args: "TrainingArguments") -> None

def patch_accelerator_for_fp8() -> None

Import

from llamafactory.train.fp8_utils import (
    create_fp8_kwargs,
    get_fp8_mixed_precision,
    configure_fp8_environment,
    verify_fp8_status,
    patch_accelerator_for_fp8,
)

I/O Contract

Inputs

Name	Type	Required	Description
training_args	TrainingArguments	Yes	Training arguments containing fp8 (bool), fp8_backend ("auto", "torchao", or "te"), and fp8_enable_fsdp_float8_all_gather
accelerator	Accelerator	Conditional	Required by verify_fp8_status; the HuggingFace Accelerator instance after model preparation

Outputs

Name	Type	Description
fp8_kwargs	list[Any]	List containing AORecipeKwargs or FP8RecipeKwargs if FP8 is enabled, empty list otherwise (from create_fp8_kwargs)
mixed_precision	Optional[str]	"fp8" if FP8 is enabled, None otherwise (from get_fp8_mixed_precision)
None	None	configure_fp8_environment, verify_fp8_status, and patch_accelerator_for_fp8 operate via side effects

Usage Examples

# Setting up FP8 training environment
from llamafactory.train.fp8_utils import configure_fp8_environment, create_fp8_kwargs

configure_fp8_environment(training_args)
fp8_kwargs = create_fp8_kwargs(training_args)
# Pass fp8_kwargs to Accelerator via kwargs_handlers

Related Pages

Hiyouga_LLaMA_Factory_Quantization - Complementary quantization utilities for model compression
Hiyouga_LLaMA_Factory_DPO_Workflow - Training workflow that may use FP8 configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment