Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory FP8 Utils

From Leeroopedia
Revision as of 15:06, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Hiyouga_LLaMA_Factory_FP8_Utils.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Mixed Precision, Training Optimization
Last Updated 2026-02-06 19:00 GMT

Overview

Provides FP8 mixed-precision training configuration, environment setup, and Accelerator patching for HuggingFace-based training.

Description

The fp8_utils module enables FP8 (8-bit floating point) training support through HuggingFace Accelerate. It supports two backends: TorchAO (default, using rowwise scaling via Float8LinearConfig) and Transformer Engine (optimal for Hopper GPUs, using HYBRID FP8 format). create_fp8_kwargs constructs the appropriate recipe kwargs with a module filter function that skips embedding/head layers and layers with dimensions not divisible by 16 for kernel compatibility. configure_fp8_environment sets environment variables including ACCELERATE_MIXED_PRECISION and optionally FP8_BACKEND and FP8_ENABLE_FSDP_FLOAT8_ALL_GATHER. verify_fp8_status checks that FP8 is actually working after model preparation. patch_accelerator_for_fp8 monkey-patches Accelerator's __init__ to inject FP8 recipe kwargs and force mixed_precision='fp8' when the HuggingFace Trainer does not natively pass kwargs_handlers, which is necessary for Transformer Engine integration.

Usage

Use these utilities when enabling FP8 training for improved throughput on supported hardware (NVIDIA Hopper GPUs). The functions are called during training setup when the fp8 flag is enabled in training arguments.

Code Reference

Source Location

Signature

def create_fp8_kwargs(training_args: "TrainingArguments") -> list[Any]

def get_fp8_mixed_precision(training_args: "TrainingArguments") -> Optional[str]

def configure_fp8_environment(training_args: "TrainingArguments") -> None

def verify_fp8_status(accelerator, training_args: "TrainingArguments") -> None

def patch_accelerator_for_fp8() -> None

Import

from llamafactory.train.fp8_utils import (
    create_fp8_kwargs,
    get_fp8_mixed_precision,
    configure_fp8_environment,
    verify_fp8_status,
    patch_accelerator_for_fp8,
)

I/O Contract

Inputs

Name Type Required Description
training_args TrainingArguments Yes Training arguments containing fp8 (bool), fp8_backend ("auto", "torchao", or "te"), and fp8_enable_fsdp_float8_all_gather
accelerator Accelerator Conditional Required by verify_fp8_status; the HuggingFace Accelerator instance after model preparation

Outputs

Name Type Description
fp8_kwargs list[Any] List containing AORecipeKwargs or FP8RecipeKwargs if FP8 is enabled, empty list otherwise (from create_fp8_kwargs)
mixed_precision Optional[str] "fp8" if FP8 is enabled, None otherwise (from get_fp8_mixed_precision)
None None configure_fp8_environment, verify_fp8_status, and patch_accelerator_for_fp8 operate via side effects

Usage Examples

# Setting up FP8 training environment
from llamafactory.train.fp8_utils import configure_fp8_environment, create_fp8_kwargs

configure_fp8_environment(training_args)
fp8_kwargs = create_fp8_kwargs(training_args)
# Pass fp8_kwargs to Accelerator via kwargs_handlers

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment