Implementation:Huggingface Alignment handbook SFTTrainer Mid Training

Knowledge Sources	Alignment Handbook TRL SFTTrainer
Domains	NLP, Deep_Learning, Training
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for reasoning mid-training using TRL's SFTTrainer with packing and Liger kernel, as configured by the alignment-handbook SmolLM3 mid-training recipe.

Description

This is the same SFTTrainer used in standard SFT, but configured specifically for the mid-training stage with reasoning-focused settings: a 2-dataset mixture of reasoning data, 32k sequence length, sequence packing enabled, Liger kernel for memory efficiency, and 5 training epochs.

The mid-training uses the same scripts/sft.py entry point as standard SFT, differentiated entirely by the YAML config file (recipes/smollm3/sft/mid.yaml).

Usage

Use this for the first post-training stage when building reasoning-capable models in a multi-stage pipeline.

Code Reference

Source Location

Repository: alignment-handbook
File: scripts/sft.py (lines 105-112 for SFTTrainer init)
Config: recipes/smollm3/sft/mid.yaml (lines 1-64)

Signature

# Same SFTTrainer, different config
trainer = SFTTrainer(
    model=model,                    # SmolLM3-3B-Base
    args=training_args,             # SFTConfig with packing=True, max_length=32768
    train_dataset=dataset["train"], # Reasoning mixture (Llama_Nemotron + OpenThoughts3)
    eval_dataset=dataset.get("test"),
    processing_class=tokenizer,
    peft_config=get_peft_config(model_args),  # None (full fine-tuning)
)

Import

from trl import SFTTrainer, get_peft_config, setup_chat_format
from alignment import SFTConfig, ScriptArguments, get_dataset, get_model, get_tokenizer

I/O Contract

Inputs

Name	Type	Required	Description
model	AutoModelForCausalLM	Yes	Base pretrained model (e.g., SmolLM3-3B-Base)
args	SFTConfig	Yes	Mid-training config with packing=True, max_length=32768
args.packing	bool	Yes	Must be True for mid-training (packs sequences for efficiency)
args.max_seq_length	int	Yes	32768 for mid-training
args.use_liger_kernel	bool	No	True for Liger kernel memory optimization
args.num_train_epochs	int	Yes	5 epochs for mid-training
train_dataset	Dataset	Yes	Reasoning dataset mixture (2 splits)
model_args.trust_remote_code	bool	Yes	True for SmolLM3 custom architecture

Outputs

Name	Type	Description
checkpoints	Files	Mid-trained model checkpoint at output_dir (e.g., data/SmolLM3-Mid)
metrics	Dict	Training metrics (loss, learning rate, throughput)

Usage Examples

Mid-Training YAML Config

# From recipes/smollm3/sft/mid.yaml
model_name_or_path: HuggingFaceTB/SmolLM3-3B-Base
trust_remote_code: true
torch_dtype: bfloat16
attn_implementation: flash_attention_2

# Mid-training specific settings
packing: true
max_seq_length: 32768
use_liger_kernel: true
num_train_epochs: 5
learning_rate: 3.0e-5
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
gradient_checkpointing: true
output_dir: data/SmolLM3-Mid

# Reasoning dataset mixture
dataset_mixture:
  datasets:
    - id: HuggingFaceTB/smoltalk2
      config: Llama_Nemotron
      columns: [messages]
      weight: 1.0
    - id: HuggingFaceTB/smoltalk2
      config: OpenThoughts3
      columns: [messages]
      weight: 1.0
  seed: 42
  test_split_size: 0.01

CLI Launch (Multi-Node)

# Mid-training on 8 nodes with DeepSpeed ZeRO-3
accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    --num_machines 8 \
    scripts/sft.py \
    --config recipes/smollm3/sft/mid.yaml

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment