Implementation:PacktPublishing LLM Engineers Handbook SFTTrainer Train
| Field | Value |
|---|---|
| Implementation Name | SFTTrainer Train |
| Type | Wrapper Doc (TRL library) |
| Source File | llm_engineering/model/finetuning/finetune.py:L117-199 |
| Workflow | LLM_Finetuning |
| Repo | PacktPublishing/LLM-Engineers-Handbook |
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_Supervised_Finetuning |
Function Signatures
SFT Training
SFTTrainer(
model,
tokenizer,
train_dataset,
eval_dataset,
dataset_text_field: str,
max_seq_length: int,
args: TrainingArguments,
).train() -> TrainOutput
DPO Training
DPOTrainer(
model,
ref_model: Optional[Model],
tokenizer,
beta: float,
train_dataset,
eval_dataset,
args: DPOConfig,
).train() -> TrainOutput
Imports
from trl import SFTTrainer, DPOTrainer, DPOConfig
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
Description
The fine-tuning pipeline uses HuggingFace's TRL (Transformer Reinforcement Learning) library to perform both SFT and DPO training. SFTTrainer handles supervised fine-tuning on instruction-response pairs, while DPOTrainer handles preference optimization on chosen/rejected pairs. Both trainers manage the complete training loop including gradient computation, optimizer steps, logging, evaluation, and checkpointing.
SFT Training Implementation
Key Code
# From llm_engineering/model/finetuning/finetune.py
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
dataset_text_field="text",
max_seq_length=max_seq_length,
args=TrainingArguments(
learning_rate=learning_rate,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=8,
optim="adamw_8bit",
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
report_to="comet_ml",
),
)
trainer.train()
SFT Parameters
| Parameter | Type | Value | Description |
|---|---|---|---|
model |
Model | — | The model with LoRA adapters injected. |
tokenizer |
Tokenizer | — | The corresponding tokenizer. |
train_dataset |
Dataset |
dataset["train"] |
Training split of the formatted dataset. |
eval_dataset |
Dataset |
dataset["test"] |
Evaluation split for validation loss monitoring. |
dataset_text_field |
str |
"text" |
Column name containing the formatted training text. |
max_seq_length |
int |
2048 |
Maximum sequence length for tokenization/truncation. |
learning_rate |
float |
3e-4 |
Learning rate for the AdamW optimizer. |
num_train_epochs |
int |
3 |
Number of training epochs. |
per_device_train_batch_size |
int |
2 |
Batch size per GPU device. |
gradient_accumulation_steps |
int |
8 |
Number of steps to accumulate gradients before optimizer update. Effective batch size = 2 * 8 = 16. |
optim |
str |
"adamw_8bit" |
8-bit AdamW optimizer to reduce memory usage. |
fp16 / bf16 |
bool |
Auto-detected | Mixed precision training. BF16 preferred when supported (Ampere+ GPUs). |
report_to |
str |
"comet_ml" |
Experiment tracking platform for logging metrics. |
DPO Training Implementation
Key Code
# From llm_engineering/model/finetuning/finetune.py
trainer = DPOTrainer(
model=model,
ref_model=None,
tokenizer=tokenizer,
beta=0.1,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
args=DPOConfig(
learning_rate=learning_rate,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=8,
optim="adamw_8bit",
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
report_to="comet_ml",
),
)
trainer.train()
DPO-Specific Parameters
| Parameter | Type | Value | Description |
|---|---|---|---|
ref_model |
Model or None |
None |
Reference model for DPO. When None, the trainer uses the initial model weights as the reference (implicit reference).
|
beta |
float |
0.1 |
Temperature parameter for the DPO loss. Lower values make the model more conservative in differentiating preferences; higher values make it more aggressive. |
Returns
trainer.train() returns a TrainOutput object, but the primary effect is in-place modification of the model's LoRA adapter weights. The fine-tuned model is available via the same model reference after training completes.
Training Metrics
Both trainers log the following metrics to Comet ML during training:
- Training loss: Per-step and per-epoch loss values.
- Evaluation loss: Validation loss computed on the test split.
- Learning rate schedule: Current learning rate at each step.
- GPU memory usage: Peak memory utilization.
External Dependencies
| Package | Purpose |
|---|---|
trl |
TRL library providing SFTTrainer and DPOTrainer |
transformers |
TrainingArguments configuration and training infrastructure |
unsloth |
BF16 support detection via is_bfloat16_supported()
|
comet_ml |
Experiment tracking and metric logging |
External References
See Also
- Principle:PacktPublishing_LLM_Engineers_Handbook_Supervised_Finetuning
- Environment:PacktPublishing_LLM_Engineers_Handbook_Unsloth_Finetuning_Environment
- Heuristic:PacktPublishing_LLM_Engineers_Handbook_LoRA_Finetuning_Parameters
- Heuristic:PacktPublishing_LLM_Engineers_Handbook_DPO_Training_Configuration