Implementation:Hiyouga LLaMA Factory Training Args

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Training Configuration, Distributed Training
Last Updated	2026-02-06 19:00 GMT

Overview

Extends HuggingFace's Seq2SeqTrainingArguments with LlamaFactory-specific support for Ray distributed training, FP8 mixed precision, and MCA (Megatron-Core Adapter) backends.

Description

This module defines three dataclass-based argument groups that compose into a single TrainingArguments class through multiple inheritance. RayArguments provides configuration for Ray-based distributed training including worker count, initialization kwargs, and master address/port settings. Fp8Arguments enables FP8 mixed precision training via HuggingFace Accelerate, targeting PyTorch 2.7+ on Hopper architecture GPUs. The base class is dynamically selected: when the environment variable USE_MCA is set, the module uses McaSeq2SeqTrainingArguments from mcore_adapter; otherwise it defaults to HuggingFace's standard Seq2SeqTrainingArguments.

Usage

Use this module when configuring training runs that require Ray distributed training, FP8 precision, or Megatron-Core Adapter integration. The TrainingArguments class is instantiated by the hyperparameter parser and passed to all trainers throughout the framework.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/hparams/training_args.py
Lines: 1-101

Signature

@dataclass
class RayArguments:
    ray_num_workers: int = 1
    ray_init_kwargs: dict | str | None = None
    master_addr: str | None = None
    master_port: str | None = None

@dataclass
class Fp8Arguments:
    fp8: bool = False
    fp8_backend: str = "auto"
    fp8_enable_fsdp_float8_all_gather: bool = False

@dataclass
class TrainingArguments(Fp8Arguments, RayArguments, BaseTrainingArguments):
    overwrite_output_dir: bool = False

Import

from llamafactory.hparams.training_args import TrainingArguments, RayArguments, Fp8Arguments

I/O Contract

Inputs

Name	Type	Required	Description
ray_num_workers	int	No (default: 1)	Number of workers for Ray distributed training
ray_init_kwargs	dict or str or None	No	Arguments passed to ray.init(); accepts JSON string or dict
master_addr	str or None	No	Master address for init_process_group in distributed training
master_port	str or None	No	Master port for init_process_group in distributed training
fp8	bool	No (default: False)	Enable FP8 mixed precision training via HuggingFace Accelerate
fp8_backend	str	No (default: "auto")	FP8 backend selection: auto, torchao, te, or msamp
fp8_enable_fsdp_float8_all_gather	bool	No (default: False)	Enable FP8 optimizations for FSDP2 all-gather operations

Outputs

Name	Type	Description
TrainingArguments instance	TrainingArguments	Fully composed training arguments dataclass combining Ray, FP8, and base HF training settings
use_ray	bool	Auto-detected attribute set in RayArguments.__post_init__ indicating whether Ray is active

Usage Examples

# Basic training arguments with default settings
from llamafactory.hparams.training_args import TrainingArguments

args = TrainingArguments(
    output_dir="./output",
    per_device_train_batch_size=4,
    learning_rate=5e-5,
)

# With Ray distributed training
args = TrainingArguments(
    output_dir="./output",
    ray_num_workers=4,
    ray_init_kwargs='{"num_cpus": 16}',
)

# With FP8 mixed precision on Hopper GPUs
args = TrainingArguments(
    output_dir="./output",
    fp8=True,
    fp8_backend="torchao",
)

Related Pages

Hiyouga_LLaMA_Factory_Launcher - CLI launcher that consumes these training arguments
Hiyouga_LLaMA_Factory_Model_Loader - Model loader that operates alongside these training settings

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment