Implementation:Huggingface Alignment handbook SFTTrainer Mid Training
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for reasoning mid-training using TRL's SFTTrainer with packing and Liger kernel, as configured by the alignment-handbook SmolLM3 mid-training recipe.
Description
This is the same SFTTrainer used in standard SFT, but configured specifically for the mid-training stage with reasoning-focused settings: a 2-dataset mixture of reasoning data, 32k sequence length, sequence packing enabled, Liger kernel for memory efficiency, and 5 training epochs.
The mid-training uses the same scripts/sft.py entry point as standard SFT, differentiated entirely by the YAML config file (recipes/smollm3/sft/mid.yaml).
Usage
Use this for the first post-training stage when building reasoning-capable models in a multi-stage pipeline.
Code Reference
Source Location
- Repository: alignment-handbook
- File: scripts/sft.py (lines 105-112 for SFTTrainer init)
- Config: recipes/smollm3/sft/mid.yaml (lines 1-64)
Signature
# Same SFTTrainer, different config
trainer = SFTTrainer(
model=model, # SmolLM3-3B-Base
args=training_args, # SFTConfig with packing=True, max_length=32768
train_dataset=dataset["train"], # Reasoning mixture (Llama_Nemotron + OpenThoughts3)
eval_dataset=dataset.get("test"),
processing_class=tokenizer,
peft_config=get_peft_config(model_args), # None (full fine-tuning)
)
Import
from trl import SFTTrainer, get_peft_config, setup_chat_format
from alignment import SFTConfig, ScriptArguments, get_dataset, get_model, get_tokenizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | AutoModelForCausalLM | Yes | Base pretrained model (e.g., SmolLM3-3B-Base) |
| args | SFTConfig | Yes | Mid-training config with packing=True, max_length=32768 |
| args.packing | bool | Yes | Must be True for mid-training (packs sequences for efficiency) |
| args.max_seq_length | int | Yes | 32768 for mid-training |
| args.use_liger_kernel | bool | No | True for Liger kernel memory optimization |
| args.num_train_epochs | int | Yes | 5 epochs for mid-training |
| train_dataset | Dataset | Yes | Reasoning dataset mixture (2 splits) |
| model_args.trust_remote_code | bool | Yes | True for SmolLM3 custom architecture |
Outputs
| Name | Type | Description |
|---|---|---|
| checkpoints | Files | Mid-trained model checkpoint at output_dir (e.g., data/SmolLM3-Mid) |
| metrics | Dict | Training metrics (loss, learning rate, throughput) |
Usage Examples
Mid-Training YAML Config
# From recipes/smollm3/sft/mid.yaml
model_name_or_path: HuggingFaceTB/SmolLM3-3B-Base
trust_remote_code: true
torch_dtype: bfloat16
attn_implementation: flash_attention_2
# Mid-training specific settings
packing: true
max_seq_length: 32768
use_liger_kernel: true
num_train_epochs: 5
learning_rate: 3.0e-5
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
gradient_checkpointing: true
output_dir: data/SmolLM3-Mid
# Reasoning dataset mixture
dataset_mixture:
datasets:
- id: HuggingFaceTB/smoltalk2
config: Llama_Nemotron
columns: [messages]
weight: 1.0
- id: HuggingFaceTB/smoltalk2
config: OpenThoughts3
columns: [messages]
weight: 1.0
seed: 42
test_split_size: 0.01
CLI Launch (Multi-Node)
# Mid-training on 8 nodes with DeepSpeed ZeRO-3
accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
--num_machines 8 \
scripts/sft.py \
--config recipes/smollm3/sft/mid.yaml