Implementation:PacktPublishing LLM Engineers Handbook SFTTrainer Train

Field	Value
Implementation Name	SFTTrainer Train
Type	Wrapper Doc (TRL library)
Source File	llm_engineering/model/finetuning/finetune.py:L117-199
Workflow	LLM_Finetuning
Repo	PacktPublishing/LLM-Engineers-Handbook
Implements	Principle:PacktPublishing_LLM_Engineers_Handbook_Supervised_Finetuning

Function Signatures

SFT Training

SFTTrainer(
    model,
    tokenizer,
    train_dataset,
    eval_dataset,
    dataset_text_field: str,
    max_seq_length: int,
    args: TrainingArguments,
).train() -> TrainOutput

DPO Training

DPOTrainer(
    model,
    ref_model: Optional[Model],
    tokenizer,
    beta: float,
    train_dataset,
    eval_dataset,
    args: DPOConfig,
).train() -> TrainOutput

Imports

from trl import SFTTrainer, DPOTrainer, DPOConfig
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

Description

The fine-tuning pipeline uses HuggingFace's TRL (Transformer Reinforcement Learning) library to perform both SFT and DPO training. SFTTrainer handles supervised fine-tuning on instruction-response pairs, while DPOTrainer handles preference optimization on chosen/rejected pairs. Both trainers manage the complete training loop including gradient computation, optimizer steps, logging, evaluation, and checkpointing.

SFT Training Implementation

Key Code

# From llm_engineering/model/finetuning/finetune.py

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    args=TrainingArguments(
        learning_rate=learning_rate,
        num_train_epochs=num_train_epochs,
        per_device_train_batch_size=per_device_train_batch_size,
        gradient_accumulation_steps=8,
        optim="adamw_8bit",
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        report_to="comet_ml",
    ),
)
trainer.train()

SFT Parameters

Parameter	Type	Value	Description
`model`	Model	—	The model with LoRA adapters injected.
`tokenizer`	Tokenizer	—	The corresponding tokenizer.
`train_dataset`	`Dataset`	`dataset["train"]`	Training split of the formatted dataset.
`eval_dataset`	`Dataset`	`dataset["test"]`	Evaluation split for validation loss monitoring.
`dataset_text_field`	`str`	`"text"`	Column name containing the formatted training text.
`max_seq_length`	`int`	`2048`	Maximum sequence length for tokenization/truncation.
`learning_rate`	`float`	`3e-4`	Learning rate for the AdamW optimizer.
`num_train_epochs`	`int`	`3`	Number of training epochs.
`per_device_train_batch_size`	`int`	`2`	Batch size per GPU device.
`gradient_accumulation_steps`	`int`	`8`	Number of steps to accumulate gradients before optimizer update. Effective batch size = 2 * 8 = 16.
`optim`	`str`	`"adamw_8bit"`	8-bit AdamW optimizer to reduce memory usage.
`fp16` / `bf16`	`bool`	Auto-detected	Mixed precision training. BF16 preferred when supported (Ampere+ GPUs).
`report_to`	`str`	`"comet_ml"`	Experiment tracking platform for logging metrics.

DPO Training Implementation

Key Code

# From llm_engineering/model/finetuning/finetune.py

trainer = DPOTrainer(
    model=model,
    ref_model=None,
    tokenizer=tokenizer,
    beta=0.1,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    args=DPOConfig(
        learning_rate=learning_rate,
        num_train_epochs=num_train_epochs,
        per_device_train_batch_size=per_device_train_batch_size,
        gradient_accumulation_steps=8,
        optim="adamw_8bit",
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        report_to="comet_ml",
    ),
)
trainer.train()

DPO-Specific Parameters

Parameter	Type	Value	Description
`ref_model`	Model or `None`	`None`	Reference model for DPO. When `None`, the trainer uses the initial model weights as the reference (implicit reference).
`beta`	`float`	`0.1`	Temperature parameter for the DPO loss. Lower values make the model more conservative in differentiating preferences; higher values make it more aggressive.

Returns

trainer.train() returns a TrainOutput object, but the primary effect is in-place modification of the model's LoRA adapter weights. The fine-tuned model is available via the same model reference after training completes.

Training Metrics

Both trainers log the following metrics to Comet ML during training:

Training loss: Per-step and per-epoch loss values.
Evaluation loss: Validation loss computed on the test split.
Learning rate schedule: Current learning rate at each step.
GPU memory usage: Peak memory utilization.

External Dependencies

Package	Purpose
`trl`	TRL library providing SFTTrainer and DPOTrainer
`transformers`	TrainingArguments configuration and training infrastructure
`unsloth`	BF16 support detection via `is_bfloat16_supported()`
`comet_ml`	Experiment tracking and metric logging

External References

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Function Signatures

SFT Training

DPO Training

Imports

Description

SFT Training Implementation

Key Code

SFT Parameters

DPO Training Implementation

Key Code

DPO-Specific Parameters

Returns

Training Metrics

External Dependencies

External References

See Also

Page Connections