Implementation:Hiyouga LLaMA Factory Training Args
| Knowledge Sources | |
|---|---|
| Domains | Training Configuration, Distributed Training |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Extends HuggingFace's Seq2SeqTrainingArguments with LlamaFactory-specific support for Ray distributed training, FP8 mixed precision, and MCA (Megatron-Core Adapter) backends.
Description
This module defines three dataclass-based argument groups that compose into a single TrainingArguments class through multiple inheritance. RayArguments provides configuration for Ray-based distributed training including worker count, initialization kwargs, and master address/port settings. Fp8Arguments enables FP8 mixed precision training via HuggingFace Accelerate, targeting PyTorch 2.7+ on Hopper architecture GPUs. The base class is dynamically selected: when the environment variable USE_MCA is set, the module uses McaSeq2SeqTrainingArguments from mcore_adapter; otherwise it defaults to HuggingFace's standard Seq2SeqTrainingArguments.
Usage
Use this module when configuring training runs that require Ray distributed training, FP8 precision, or Megatron-Core Adapter integration. The TrainingArguments class is instantiated by the hyperparameter parser and passed to all trainers throughout the framework.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/hparams/training_args.py
- Lines: 1-101
Signature
@dataclass
class RayArguments:
ray_num_workers: int = 1
ray_init_kwargs: dict | str | None = None
master_addr: str | None = None
master_port: str | None = None
@dataclass
class Fp8Arguments:
fp8: bool = False
fp8_backend: str = "auto"
fp8_enable_fsdp_float8_all_gather: bool = False
@dataclass
class TrainingArguments(Fp8Arguments, RayArguments, BaseTrainingArguments):
overwrite_output_dir: bool = False
Import
from llamafactory.hparams.training_args import TrainingArguments, RayArguments, Fp8Arguments
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ray_num_workers | int | No (default: 1) | Number of workers for Ray distributed training |
| ray_init_kwargs | dict or str or None | No | Arguments passed to ray.init(); accepts JSON string or dict |
| master_addr | str or None | No | Master address for init_process_group in distributed training |
| master_port | str or None | No | Master port for init_process_group in distributed training |
| fp8 | bool | No (default: False) | Enable FP8 mixed precision training via HuggingFace Accelerate |
| fp8_backend | str | No (default: "auto") | FP8 backend selection: auto, torchao, te, or msamp |
| fp8_enable_fsdp_float8_all_gather | bool | No (default: False) | Enable FP8 optimizations for FSDP2 all-gather operations |
Outputs
| Name | Type | Description |
|---|---|---|
| TrainingArguments instance | TrainingArguments | Fully composed training arguments dataclass combining Ray, FP8, and base HF training settings |
| use_ray | bool | Auto-detected attribute set in RayArguments.__post_init__ indicating whether Ray is active |
Usage Examples
# Basic training arguments with default settings
from llamafactory.hparams.training_args import TrainingArguments
args = TrainingArguments(
output_dir="./output",
per_device_train_batch_size=4,
learning_rate=5e-5,
)
# With Ray distributed training
args = TrainingArguments(
output_dir="./output",
ray_num_workers=4,
ray_init_kwargs='{"num_cpus": 16}',
)
# With FP8 mixed precision on Hopper GPUs
args = TrainingArguments(
output_dir="./output",
fp8=True,
fp8_backend="torchao",
)
Related Pages
- Hiyouga_LLaMA_Factory_Launcher - CLI launcher that consumes these training arguments
- Hiyouga_LLaMA_Factory_Model_Loader - Model loader that operates alongside these training settings