Implementation:Huggingface Alignment handbook SFTTrainer Usage
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for supervised fine-tuning of language models using TRL's SFTTrainer, as configured by the alignment-handbook training script.
Description
SFTTrainer is TRL's supervised fine-tuning trainer that extends HuggingFace's Trainer with SFT-specific features including sequence packing, chat template application, and optional PEFT/LoRA adapter injection. In the alignment-handbook, it is initialized in scripts/sft.py with model, training arguments, dataset, tokenizer, and optional PEFT config.
The alignment-handbook's SFT script adds:
- Chat template fallback to ChatML if the model has no template
- Checkpoint resumption support
- Model card creation and HuggingFace Hub publishing
- EOS token alignment between model generation config and tokenizer
Usage
Use this when running the SFT stage of any alignment pipeline in the handbook. This is the standard entry point for the first training stage.
Code Reference
Source Location
- Repository: alignment-handbook
- File: scripts/sft.py (lines 105-112 for SFTTrainer init, lines 54-174 for full main function)
Signature
# From scripts/sft.py:L105-112
trainer = SFTTrainer(
model=model, # AutoModelForCausalLM from get_model()
args=training_args, # SFTConfig with all training hyperparameters
train_dataset=dataset[script_args.dataset_train_split],
eval_dataset=(
dataset[script_args.dataset_test_split]
if training_args.eval_strategy != "no"
else None
),
processing_class=tokenizer, # PreTrainedTokenizer from get_tokenizer()
peft_config=get_peft_config(model_args), # None or LoraConfig
)
Import
from trl import SFTTrainer, get_peft_config, setup_chat_format
from alignment import SFTConfig, ScriptArguments, get_dataset, get_model, get_tokenizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | AutoModelForCausalLM | Yes | Pretrained model from get_model() |
| args | SFTConfig | Yes | Training hyperparameters (learning_rate, num_train_epochs, batch_size, etc.) |
| train_dataset | Dataset | Yes | Training split from get_dataset() |
| eval_dataset | Dataset | No | Evaluation split (None if eval_strategy="no") |
| processing_class | PreTrainedTokenizer | Yes | Tokenizer from get_tokenizer() |
| peft_config | Optional[PeftConfig] | No | LoRA config from get_peft_config() (None for full fine-tuning) |
Outputs
| Name | Type | Description |
|---|---|---|
| trainer.train() returns | TrainOutput | Contains global_step, training_loss, metrics |
| checkpoints | Files | Saved to training_args.output_dir |
| metrics | Dict | Training and evaluation metrics logged and saved |
| model_card | File | Generated model card with training details |
Usage Examples
Full SFT Training Pipeline
from alignment import ScriptArguments, SFTConfig, get_dataset, get_model, get_tokenizer
from trl import ModelConfig, SFTTrainer, TrlParser, get_peft_config, setup_chat_format
from transformers import set_seed
# 1. Parse config
parser = TrlParser((ScriptArguments, SFTConfig, ModelConfig))
script_args, training_args, model_args = parser.parse_args_and_config()
set_seed(training_args.seed)
# 2. Load data, tokenizer, model
dataset = get_dataset(script_args)
tokenizer = get_tokenizer(model_args, training_args)
model = get_model(model_args, training_args)
# 3. Chat template fallback
if tokenizer.chat_template is None:
model, tokenizer = setup_chat_format(model, tokenizer, format="chatml")
# 4. Initialize trainer
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset.get("test"),
processing_class=tokenizer,
peft_config=get_peft_config(model_args),
)
# 5. Train
train_result = trainer.train()
# 6. Save
trainer.model.generation_config.eos_token_id = tokenizer.eos_token_id
trainer.save_model(training_args.output_dir)
CLI Launch
# Full fine-tuning with ZeRO-3
accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
scripts/sft.py \
--config recipes/zephyr-7b-beta/sft/config_full.yaml
# QLoRA fine-tuning on single GPU
accelerate launch --config_file recipes/accelerate_configs/ddp.yaml \
scripts/sft.py \
--config recipes/zephyr-7b-beta/sft/config_qlora.yaml