Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Transformers TrainingArguments

From Leeroopedia
Knowledge Sources
Domains NLP, Training, MLOps
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete tool for specifying all training hyperparameters and infrastructure settings in a single configuration object, provided by the HuggingFace Transformers library.

Description

TrainingArguments is a dataclass that centralizes every tunable parameter of the Trainer's training loop. It covers training duration, optimization, precision, checkpointing, logging, evaluation, distributed training, and Hub integration. The class supports construction from Python code, command-line arguments (via HfArgumentParser), or deserialization from JSON. All fields have sensible defaults so a minimal configuration requires only the output_dir parameter.

The class also performs automatic validation at initialization time, detecting incompatible settings such as conflicting precision modes, missing evaluation strategies when load_best_model_at_end is enabled, or unsupported hardware configurations.

Usage

Create a TrainingArguments instance before initializing the Trainer. Use it to control all aspects of the training run, from basic hyperparameters to advanced distributed training configurations.

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/training_args.py (lines 178-748, class definition and docstring; fields continue to ~line 1200)

Signature

@dataclass
class TrainingArguments:
    output_dir: str | None = None
    per_device_train_batch_size: int = 8
    per_device_eval_batch_size: int = 8
    num_train_epochs: float = 3.0
    max_steps: int = -1
    learning_rate: float = 5e-5
    lr_scheduler_type: str = "linear"
    warmup_steps: int = 0
    weight_decay: float = 0.0
    optim: str = "adamw_torch"
    gradient_accumulation_steps: int = 1
    max_grad_norm: float = 1.0
    bf16: bool = False
    fp16: bool = False
    logging_strategy: str = "steps"
    logging_steps: int = 500
    eval_strategy: str = "no"
    eval_steps: int | None = None
    save_strategy: str = "steps"
    save_steps: int = 500
    save_total_limit: int | None = None
    load_best_model_at_end: bool = False
    metric_for_best_model: str | None = None
    seed: int = 42
    push_to_hub: bool = False
    report_to: str | list[str] = "none"
    gradient_checkpointing: bool = False
    deepspeed: str | dict | None = None
    fsdp: str | list[str] | None = None
    torch_compile: bool = False
    # ... and many more fields

Import

from transformers import TrainingArguments

I/O Contract

Inputs

Name Type Required Description
output_dir str Yes Directory for model checkpoints and predictions
num_train_epochs float No Number of training epochs (default: 3.0)
per_device_train_batch_size int No Batch size per device for training (default: 8)
per_device_eval_batch_size int No Batch size per device for evaluation (default: 8)
learning_rate float No Initial learning rate for the optimizer (default: 5e-5)
lr_scheduler_type str No Learning rate scheduler type: "linear", "cosine", "constant", etc. (default: "linear")
warmup_steps int No Number of warmup steps for the learning rate scheduler (default: 0)
weight_decay float No Weight decay coefficient (default: 0.0)
optim str No Optimizer name: "adamw_torch", "adamw_torch_fused", "adafactor", etc. (default: "adamw_torch")
gradient_accumulation_steps int No Number of gradient accumulation steps before optimizer update (default: 1)
max_grad_norm float No Maximum gradient norm for clipping (default: 1.0)
bf16 bool No Enable bfloat16 mixed-precision training (default: False)
fp16 bool No Enable float16 mixed-precision training (default: False)
eval_strategy str No When to evaluate: "no", "steps", or "epoch" (default: "no")
save_strategy str No When to save checkpoints: "no", "steps", "epoch", or "best" (default: "steps")
logging_steps int No Log every N steps (default: 500)
seed int No Random seed for reproducibility (default: 42)
push_to_hub bool No Push model to HuggingFace Hub on save (default: False)
deepspeed str or dict No Path to DeepSpeed config file or config dict
fsdp str or list No FSDP sharding strategy
gradient_checkpointing bool No Enable gradient checkpointing to save memory (default: False)

Outputs

Name Type Description
args TrainingArguments A fully validated configuration object ready to be passed to the Trainer constructor

Usage Examples

Basic Usage

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    learning_rate=5e-5,
)

Advanced Configuration with Mixed Precision and Evaluation

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=2e-5,
    lr_scheduler_type="cosine",
    warmup_steps=500,
    weight_decay=0.01,
    bf16=True,
    eval_strategy="steps",
    eval_steps=500,
    save_strategy="steps",
    save_steps=500,
    save_total_limit=3,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    logging_steps=100,
    report_to="wandb",
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
)

Command-Line Usage with HfArgumentParser

from transformers import HfArgumentParser, TrainingArguments

parser = HfArgumentParser(TrainingArguments)
args = parser.parse_args_into_dataclasses()[0]

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment