Implementation:Huggingface Transformers Trainer Init
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training, Software Architecture |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for assembling a model, datasets, configuration, and auxiliary components into a managed training orchestrator, provided by the HuggingFace Transformers library.
Description
Trainer.__init__() is the constructor that wires together all components needed for training. It performs extensive initialization including argument validation, accelerator setup, model placement across devices, data collator selection, callback registration, Hub repository creation, and training state initialization. The constructor follows an eleven-phase staged initialization pattern to ensure dependencies are resolved in the correct order.
The Trainer class is optimized for PreTrainedModel instances but also supports standard torch.nn.Module models. It uses dependency injection: the caller provides the model, datasets, and configuration, and the Trainer manages the orchestration of the training loop.
Usage
Instantiate a Trainer once you have all the required components ready: a model (or model_init function), training arguments, and at least a training dataset. Optionally provide an evaluation dataset, processing class (tokenizer), compute_metrics function, and custom callbacks.
Code Reference
Source Location
- Repository: transformers
- File: src/transformers/trainer.py (lines 256-400+, class definition and __init__)
Signature
class Trainer:
def __init__(
self,
model: PreTrainedModel | nn.Module | None = None,
args: TrainingArguments | None = None,
data_collator: DataCollator | None = None,
train_dataset: Dataset | IterableDataset | datasets.Dataset | None = None,
eval_dataset: Dataset | dict[str, Dataset] | datasets.Dataset | None = None,
processing_class: PreTrainedTokenizerBase
| BaseImageProcessor
| FeatureExtractionMixin
| ProcessorMixin
| None = None,
model_init: Callable[..., PreTrainedModel] | None = None,
compute_loss_func: Callable | None = None,
compute_metrics: Callable[[EvalPrediction], dict] | None = None,
callbacks: list[TrainerCallback] | None = None,
optimizers: tuple[torch.optim.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None),
optimizer_cls_and_kwargs: tuple[type[torch.optim.Optimizer], dict[str, Any]] | None = None,
preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None,
):
Import
from transformers import Trainer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PreTrainedModel or nn.Module | No* | The model to train. Either model or model_init must be provided |
| args | TrainingArguments | No | Training configuration. Defaults to a basic TrainingArguments with output_dir="tmp_trainer" |
| data_collator | DataCollator | No | Function to form batches. Defaults to default_data_collator or DataCollatorWithPadding if a tokenizer is provided |
| train_dataset | Dataset | No | Training dataset. Columns not accepted by model.forward() are automatically removed |
| eval_dataset | Dataset or dict[str, Dataset] | No | Evaluation dataset(s). If a dict, evaluates on each dataset separately |
| processing_class | PreTrainedTokenizerBase or processor | No | Tokenizer, image processor, or feature extractor. Saved alongside the model for easy reuse |
| model_init | Callable | No* | Function that returns a fresh model instance. Used for hyperparameter search. Either model or model_init must be provided |
| compute_loss_func | Callable | No | Custom loss function accepting model outputs, labels, and accumulated batch size |
| compute_metrics | Callable | No | Function that receives EvalPrediction and returns a dict of metrics |
| callbacks | list[TrainerCallback] | No | Additional callbacks to customize the training loop |
| optimizers | tuple | No | Pre-built (optimizer, scheduler) tuple. Defaults to (None, None) for automatic creation |
| optimizer_cls_and_kwargs | tuple | No | Optimizer class and kwargs tuple. Overrides optim and optim_args in args |
| preprocess_logits_for_metrics | Callable | No | Function to process logits before caching for evaluation |
Outputs
| Name | Type | Description |
|---|---|---|
| trainer | Trainer | A fully initialized Trainer instance ready for train(), evaluate(), or predict() calls |
Usage Examples
Basic Usage
from transformers import Trainer, TrainingArguments, AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
args = TrainingArguments(output_dir="./results", num_train_epochs=3)
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
processing_class=tokenizer,
)
With Custom Metrics and Callbacks
import numpy as np
from transformers import Trainer, TrainingArguments, EarlyStoppingCallback
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
accuracy = (predictions == labels).mean()
return {"accuracy": accuracy}
trainer = Trainer(
model=model,
args=TrainingArguments(
output_dir="./results",
eval_strategy="epoch",
load_best_model_at_end=True,
),
train_dataset=train_dataset,
eval_dataset=eval_dataset,
processing_class=tokenizer,
compute_metrics=compute_metrics,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)