Implementation:Huggingface Transformers TrainingArguments
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training, MLOps |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for specifying all training hyperparameters and infrastructure settings in a single configuration object, provided by the HuggingFace Transformers library.
Description
TrainingArguments is a dataclass that centralizes every tunable parameter of the Trainer's training loop. It covers training duration, optimization, precision, checkpointing, logging, evaluation, distributed training, and Hub integration. The class supports construction from Python code, command-line arguments (via HfArgumentParser), or deserialization from JSON. All fields have sensible defaults so a minimal configuration requires only the output_dir parameter.
The class also performs automatic validation at initialization time, detecting incompatible settings such as conflicting precision modes, missing evaluation strategies when load_best_model_at_end is enabled, or unsupported hardware configurations.
Usage
Create a TrainingArguments instance before initializing the Trainer. Use it to control all aspects of the training run, from basic hyperparameters to advanced distributed training configurations.
Code Reference
Source Location
- Repository: transformers
- File: src/transformers/training_args.py (lines 178-748, class definition and docstring; fields continue to ~line 1200)
Signature
@dataclass
class TrainingArguments:
output_dir: str | None = None
per_device_train_batch_size: int = 8
per_device_eval_batch_size: int = 8
num_train_epochs: float = 3.0
max_steps: int = -1
learning_rate: float = 5e-5
lr_scheduler_type: str = "linear"
warmup_steps: int = 0
weight_decay: float = 0.0
optim: str = "adamw_torch"
gradient_accumulation_steps: int = 1
max_grad_norm: float = 1.0
bf16: bool = False
fp16: bool = False
logging_strategy: str = "steps"
logging_steps: int = 500
eval_strategy: str = "no"
eval_steps: int | None = None
save_strategy: str = "steps"
save_steps: int = 500
save_total_limit: int | None = None
load_best_model_at_end: bool = False
metric_for_best_model: str | None = None
seed: int = 42
push_to_hub: bool = False
report_to: str | list[str] = "none"
gradient_checkpointing: bool = False
deepspeed: str | dict | None = None
fsdp: str | list[str] | None = None
torch_compile: bool = False
# ... and many more fields
Import
from transformers import TrainingArguments
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| output_dir | str | Yes | Directory for model checkpoints and predictions |
| num_train_epochs | float | No | Number of training epochs (default: 3.0) |
| per_device_train_batch_size | int | No | Batch size per device for training (default: 8) |
| per_device_eval_batch_size | int | No | Batch size per device for evaluation (default: 8) |
| learning_rate | float | No | Initial learning rate for the optimizer (default: 5e-5) |
| lr_scheduler_type | str | No | Learning rate scheduler type: "linear", "cosine", "constant", etc. (default: "linear") |
| warmup_steps | int | No | Number of warmup steps for the learning rate scheduler (default: 0) |
| weight_decay | float | No | Weight decay coefficient (default: 0.0) |
| optim | str | No | Optimizer name: "adamw_torch", "adamw_torch_fused", "adafactor", etc. (default: "adamw_torch") |
| gradient_accumulation_steps | int | No | Number of gradient accumulation steps before optimizer update (default: 1) |
| max_grad_norm | float | No | Maximum gradient norm for clipping (default: 1.0) |
| bf16 | bool | No | Enable bfloat16 mixed-precision training (default: False) |
| fp16 | bool | No | Enable float16 mixed-precision training (default: False) |
| eval_strategy | str | No | When to evaluate: "no", "steps", or "epoch" (default: "no") |
| save_strategy | str | No | When to save checkpoints: "no", "steps", "epoch", or "best" (default: "steps") |
| logging_steps | int | No | Log every N steps (default: 500) |
| seed | int | No | Random seed for reproducibility (default: 42) |
| push_to_hub | bool | No | Push model to HuggingFace Hub on save (default: False) |
| deepspeed | str or dict | No | Path to DeepSpeed config file or config dict |
| fsdp | str or list | No | FSDP sharding strategy |
| gradient_checkpointing | bool | No | Enable gradient checkpointing to save memory (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| args | TrainingArguments | A fully validated configuration object ready to be passed to the Trainer constructor |
Usage Examples
Basic Usage
from transformers import TrainingArguments
args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=8,
learning_rate=5e-5,
)
Advanced Configuration with Mixed Precision and Evaluation
from transformers import TrainingArguments
args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
learning_rate=2e-5,
lr_scheduler_type="cosine",
warmup_steps=500,
weight_decay=0.01,
bf16=True,
eval_strategy="steps",
eval_steps=500,
save_strategy="steps",
save_steps=500,
save_total_limit=3,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
logging_steps=100,
report_to="wandb",
gradient_accumulation_steps=4,
gradient_checkpointing=True,
)
Command-Line Usage with HfArgumentParser
from transformers import HfArgumentParser, TrainingArguments
parser = HfArgumentParser(TrainingArguments)
args = parser.parse_args_into_dataclasses()[0]
Related Pages
Implements Principle
Requires Environment
- Environment:Huggingface_Transformers_Python_310_Runtime
- Environment:Huggingface_Transformers_PyTorch_24_CUDA