Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Alignment handbook SFTTrainer Usage

From Leeroopedia


Knowledge Sources
Domains NLP, Deep_Learning, Training
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for supervised fine-tuning of language models using TRL's SFTTrainer, as configured by the alignment-handbook training script.

Description

SFTTrainer is TRL's supervised fine-tuning trainer that extends HuggingFace's Trainer with SFT-specific features including sequence packing, chat template application, and optional PEFT/LoRA adapter injection. In the alignment-handbook, it is initialized in scripts/sft.py with model, training arguments, dataset, tokenizer, and optional PEFT config.

The alignment-handbook's SFT script adds:

  • Chat template fallback to ChatML if the model has no template
  • Checkpoint resumption support
  • Model card creation and HuggingFace Hub publishing
  • EOS token alignment between model generation config and tokenizer

Usage

Use this when running the SFT stage of any alignment pipeline in the handbook. This is the standard entry point for the first training stage.

Code Reference

Source Location

  • Repository: alignment-handbook
  • File: scripts/sft.py (lines 105-112 for SFTTrainer init, lines 54-174 for full main function)

Signature

# From scripts/sft.py:L105-112
trainer = SFTTrainer(
    model=model,                    # AutoModelForCausalLM from get_model()
    args=training_args,             # SFTConfig with all training hyperparameters
    train_dataset=dataset[script_args.dataset_train_split],
    eval_dataset=(
        dataset[script_args.dataset_test_split]
        if training_args.eval_strategy != "no"
        else None
    ),
    processing_class=tokenizer,     # PreTrainedTokenizer from get_tokenizer()
    peft_config=get_peft_config(model_args),  # None or LoraConfig
)

Import

from trl import SFTTrainer, get_peft_config, setup_chat_format
from alignment import SFTConfig, ScriptArguments, get_dataset, get_model, get_tokenizer

I/O Contract

Inputs

Name Type Required Description
model AutoModelForCausalLM Yes Pretrained model from get_model()
args SFTConfig Yes Training hyperparameters (learning_rate, num_train_epochs, batch_size, etc.)
train_dataset Dataset Yes Training split from get_dataset()
eval_dataset Dataset No Evaluation split (None if eval_strategy="no")
processing_class PreTrainedTokenizer Yes Tokenizer from get_tokenizer()
peft_config Optional[PeftConfig] No LoRA config from get_peft_config() (None for full fine-tuning)

Outputs

Name Type Description
trainer.train() returns TrainOutput Contains global_step, training_loss, metrics
checkpoints Files Saved to training_args.output_dir
metrics Dict Training and evaluation metrics logged and saved
model_card File Generated model card with training details

Usage Examples

Full SFT Training Pipeline

from alignment import ScriptArguments, SFTConfig, get_dataset, get_model, get_tokenizer
from trl import ModelConfig, SFTTrainer, TrlParser, get_peft_config, setup_chat_format
from transformers import set_seed

# 1. Parse config
parser = TrlParser((ScriptArguments, SFTConfig, ModelConfig))
script_args, training_args, model_args = parser.parse_args_and_config()
set_seed(training_args.seed)

# 2. Load data, tokenizer, model
dataset = get_dataset(script_args)
tokenizer = get_tokenizer(model_args, training_args)
model = get_model(model_args, training_args)

# 3. Chat template fallback
if tokenizer.chat_template is None:
    model, tokenizer = setup_chat_format(model, tokenizer, format="chatml")

# 4. Initialize trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset.get("test"),
    processing_class=tokenizer,
    peft_config=get_peft_config(model_args),
)

# 5. Train
train_result = trainer.train()

# 6. Save
trainer.model.generation_config.eos_token_id = tokenizer.eos_token_id
trainer.save_model(training_args.output_dir)

CLI Launch

# Full fine-tuning with ZeRO-3
accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    scripts/sft.py \
    --config recipes/zephyr-7b-beta/sft/config_full.yaml

# QLoRA fine-tuning on single GPU
accelerate launch --config_file recipes/accelerate_configs/ddp.yaml \
    scripts/sft.py \
    --config recipes/zephyr-7b-beta/sft/config_qlora.yaml

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment