Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Transformers Trainer Init

From Leeroopedia
Knowledge Sources
Domains NLP, Training, Software Architecture
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete tool for assembling a model, datasets, configuration, and auxiliary components into a managed training orchestrator, provided by the HuggingFace Transformers library.

Description

Trainer.__init__() is the constructor that wires together all components needed for training. It performs extensive initialization including argument validation, accelerator setup, model placement across devices, data collator selection, callback registration, Hub repository creation, and training state initialization. The constructor follows an eleven-phase staged initialization pattern to ensure dependencies are resolved in the correct order.

The Trainer class is optimized for PreTrainedModel instances but also supports standard torch.nn.Module models. It uses dependency injection: the caller provides the model, datasets, and configuration, and the Trainer manages the orchestration of the training loop.

Usage

Instantiate a Trainer once you have all the required components ready: a model (or model_init function), training arguments, and at least a training dataset. Optionally provide an evaluation dataset, processing class (tokenizer), compute_metrics function, and custom callbacks.

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/trainer.py (lines 256-400+, class definition and __init__)

Signature

class Trainer:
    def __init__(
        self,
        model: PreTrainedModel | nn.Module | None = None,
        args: TrainingArguments | None = None,
        data_collator: DataCollator | None = None,
        train_dataset: Dataset | IterableDataset | datasets.Dataset | None = None,
        eval_dataset: Dataset | dict[str, Dataset] | datasets.Dataset | None = None,
        processing_class: PreTrainedTokenizerBase
            | BaseImageProcessor
            | FeatureExtractionMixin
            | ProcessorMixin
            | None = None,
        model_init: Callable[..., PreTrainedModel] | None = None,
        compute_loss_func: Callable | None = None,
        compute_metrics: Callable[[EvalPrediction], dict] | None = None,
        callbacks: list[TrainerCallback] | None = None,
        optimizers: tuple[torch.optim.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None),
        optimizer_cls_and_kwargs: tuple[type[torch.optim.Optimizer], dict[str, Any]] | None = None,
        preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None,
    ):

Import

from transformers import Trainer

I/O Contract

Inputs

Name Type Required Description
model PreTrainedModel or nn.Module No* The model to train. Either model or model_init must be provided
args TrainingArguments No Training configuration. Defaults to a basic TrainingArguments with output_dir="tmp_trainer"
data_collator DataCollator No Function to form batches. Defaults to default_data_collator or DataCollatorWithPadding if a tokenizer is provided
train_dataset Dataset No Training dataset. Columns not accepted by model.forward() are automatically removed
eval_dataset Dataset or dict[str, Dataset] No Evaluation dataset(s). If a dict, evaluates on each dataset separately
processing_class PreTrainedTokenizerBase or processor No Tokenizer, image processor, or feature extractor. Saved alongside the model for easy reuse
model_init Callable No* Function that returns a fresh model instance. Used for hyperparameter search. Either model or model_init must be provided
compute_loss_func Callable No Custom loss function accepting model outputs, labels, and accumulated batch size
compute_metrics Callable No Function that receives EvalPrediction and returns a dict of metrics
callbacks list[TrainerCallback] No Additional callbacks to customize the training loop
optimizers tuple No Pre-built (optimizer, scheduler) tuple. Defaults to (None, None) for automatic creation
optimizer_cls_and_kwargs tuple No Optimizer class and kwargs tuple. Overrides optim and optim_args in args
preprocess_logits_for_metrics Callable No Function to process logits before caching for evaluation

Outputs

Name Type Description
trainer Trainer A fully initialized Trainer instance ready for train(), evaluate(), or predict() calls

Usage Examples

Basic Usage

from transformers import Trainer, TrainingArguments, AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

args = TrainingArguments(output_dir="./results", num_train_epochs=3)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    processing_class=tokenizer,
)

With Custom Metrics and Callbacks

import numpy as np
from transformers import Trainer, TrainingArguments, EarlyStoppingCallback

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = (predictions == labels).mean()
    return {"accuracy": accuracy}

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./results",
        eval_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    processing_class=tokenizer,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment