Implementation:Huggingface Transformers TrainingArguments

Knowledge Sources	Transformers Transformers Docs
Domains	NLP, Training, MLOps
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for specifying all training hyperparameters and infrastructure settings in a single configuration object, provided by the HuggingFace Transformers library.

Description

TrainingArguments is a dataclass that centralizes every tunable parameter of the Trainer's training loop. It covers training duration, optimization, precision, checkpointing, logging, evaluation, distributed training, and Hub integration. The class supports construction from Python code, command-line arguments (via HfArgumentParser), or deserialization from JSON. All fields have sensible defaults so a minimal configuration requires only the output_dir parameter.

The class also performs automatic validation at initialization time, detecting incompatible settings such as conflicting precision modes, missing evaluation strategies when load_best_model_at_end is enabled, or unsupported hardware configurations.

Usage

Create a TrainingArguments instance before initializing the Trainer. Use it to control all aspects of the training run, from basic hyperparameters to advanced distributed training configurations.

Code Reference

Source Location

Repository: transformers
File: src/transformers/training_args.py (lines 178-748, class definition and docstring; fields continue to ~line 1200)

Signature

@dataclass
class TrainingArguments:
    output_dir: str | None = None
    per_device_train_batch_size: int = 8
    per_device_eval_batch_size: int = 8
    num_train_epochs: float = 3.0
    max_steps: int = -1
    learning_rate: float = 5e-5
    lr_scheduler_type: str = "linear"
    warmup_steps: int = 0
    weight_decay: float = 0.0
    optim: str = "adamw_torch"
    gradient_accumulation_steps: int = 1
    max_grad_norm: float = 1.0
    bf16: bool = False
    fp16: bool = False
    logging_strategy: str = "steps"
    logging_steps: int = 500
    eval_strategy: str = "no"
    eval_steps: int | None = None
    save_strategy: str = "steps"
    save_steps: int = 500
    save_total_limit: int | None = None
    load_best_model_at_end: bool = False
    metric_for_best_model: str | None = None
    seed: int = 42
    push_to_hub: bool = False
    report_to: str | list[str] = "none"
    gradient_checkpointing: bool = False
    deepspeed: str | dict | None = None
    fsdp: str | list[str] | None = None
    torch_compile: bool = False
    # ... and many more fields

Import

from transformers import TrainingArguments

I/O Contract

Inputs

Name	Type	Required	Description
output_dir	str	Yes	Directory for model checkpoints and predictions
num_train_epochs	float	No	Number of training epochs (default: 3.0)
per_device_train_batch_size	int	No	Batch size per device for training (default: 8)
per_device_eval_batch_size	int	No	Batch size per device for evaluation (default: 8)
learning_rate	float	No	Initial learning rate for the optimizer (default: 5e-5)
lr_scheduler_type	str	No	Learning rate scheduler type: "linear", "cosine", "constant", etc. (default: "linear")
warmup_steps	int	No	Number of warmup steps for the learning rate scheduler (default: 0)
weight_decay	float	No	Weight decay coefficient (default: 0.0)
optim	str	No	Optimizer name: "adamw_torch", "adamw_torch_fused", "adafactor", etc. (default: "adamw_torch")
gradient_accumulation_steps	int	No	Number of gradient accumulation steps before optimizer update (default: 1)
max_grad_norm	float	No	Maximum gradient norm for clipping (default: 1.0)
bf16	bool	No	Enable bfloat16 mixed-precision training (default: False)
fp16	bool	No	Enable float16 mixed-precision training (default: False)
eval_strategy	str	No	When to evaluate: "no", "steps", or "epoch" (default: "no")
save_strategy	str	No	When to save checkpoints: "no", "steps", "epoch", or "best" (default: "steps")
logging_steps	int	No	Log every N steps (default: 500)
seed	int	No	Random seed for reproducibility (default: 42)
push_to_hub	bool	No	Push model to HuggingFace Hub on save (default: False)
deepspeed	str or dict	No	Path to DeepSpeed config file or config dict
fsdp	str or list	No	FSDP sharding strategy
gradient_checkpointing	bool	No	Enable gradient checkpointing to save memory (default: False)

Outputs

Name	Type	Description
args	TrainingArguments	A fully validated configuration object ready to be passed to the Trainer constructor

Usage Examples

Basic Usage

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    learning_rate=5e-5,
)

Advanced Configuration with Mixed Precision and Evaluation

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=2e-5,
    lr_scheduler_type="cosine",
    warmup_steps=500,
    weight_decay=0.01,
    bf16=True,
    eval_strategy="steps",
    eval_steps=500,
    save_strategy="steps",
    save_steps=500,
    save_total_limit=3,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    logging_steps=100,
    report_to="wandb",
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
)

Command-Line Usage with HfArgumentParser

from transformers import HfArgumentParser, TrainingArguments

parser = HfArgumentParser(TrainingArguments)
args = parser.parse_args_into_dataclasses()[0]

Related Pages

Implements Principle

Principle:Huggingface_Transformers_Training_Configuration

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment