Implementation:Huggingface Alignment handbook SFTTrainer Multi Task
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for multi-task SFT training using TRL's SFTTrainer with 25-split dataset mixture, assistant-only loss, and FFD packing, as configured by the alignment-handbook SmolLM3 SFT recipe.
Description
This is the same SFTTrainer used in standard SFT but configured for advanced multi-task training. The SmolLM3 SFT recipe (recipes/smollm3/sft/sft.yaml) is the most complex configuration in the alignment-handbook, featuring 25 dataset splits with per-split weights, assistant_only_loss for selective loss masking, packing_strategy: ffd for efficient sequence binning, and a custom Jinja2 chat template supporting thinking modes.
The training uses the same scripts/sft.py entry point, differentiated entirely by the YAML config.
Usage
Use this for the SFT stage of advanced multi-stage pipelines where diverse multi-task capabilities and reasoning modes are required.
Code Reference
Source Location
- Repository: alignment-handbook
- File: scripts/sft.py (lines 105-112 for SFTTrainer init)
- Config: recipes/smollm3/sft/sft.yaml (lines 1-228)
Signature
# Same SFTTrainer, advanced config
trainer = SFTTrainer(
model=model, # SmolLM3-3B mid-training checkpoint
args=training_args, # SFTConfig with assistant_only_loss, packing, FFD
train_dataset=dataset["train"], # 25-split mixture
eval_dataset=dataset.get("test"),
processing_class=tokenizer, # Tokenizer with thinking mode chat template
peft_config=get_peft_config(model_args), # None (full fine-tuning)
)
Import
from trl import SFTTrainer, get_peft_config
from alignment import SFTConfig, ScriptArguments, get_dataset, get_model, get_tokenizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | AutoModelForCausalLM | Yes | Mid-training checkpoint (e.g., SmolLM3-3B-Mid) |
| args | SFTConfig | Yes | Multi-task SFT config |
| args.assistant_only_loss | bool | Yes | True - only compute loss on assistant tokens |
| args.packing | bool | Yes | True - enable sequence packing |
| args.packing_strategy | str | No | "ffd" for first-fit-decreasing bin packing |
| args.max_seq_length | int | Yes | 65536 for SmolLM3 SFT |
| args.use_liger_kernel | bool | No | True for Liger kernel optimization |
| args.chat_template | str | No | Custom Jinja2 template with thinking mode support |
| train_dataset | Dataset | Yes | 25-split dataset mixture with varied weights |
Outputs
| Name | Type | Description |
|---|---|---|
| checkpoints | Files | SFT model checkpoint at output_dir (e.g., data/SmolLM3-SFT) |
| metrics | Dict | Training metrics (loss, learning rate, throughput) |
Usage Examples
Multi-Task SFT YAML Config (Abbreviated)
# From recipes/smollm3/sft/sft.yaml (abbreviated)
model_name_or_path: HuggingFaceTB/SmolLM3-3B-checkpoints
model_revision: it-mid-training
trust_remote_code: true
torch_dtype: bfloat16
attn_implementation: flash_attention_2
# Multi-task SFT settings
assistant_only_loss: true
packing: true
packing_strategy: ffd
max_seq_length: 65536
use_liger_kernel: true
num_train_epochs: 4
learning_rate: 3.0e-5
output_dir: data/SmolLM3-SFT
# Custom chat template with thinking mode support
chat_template: >
{# Template with enable_thinking support #}
...
# 25-split dataset mixture (abbreviated)
dataset_mixture:
datasets:
- id: HuggingFaceTB/smoltalk2
config: everyday-conversations_think
columns: [messages]
weight: 0.3
- id: HuggingFaceTB/smoltalk2
config: smol-magpie-ultra_think
columns: [messages]
weight: 0.15
- id: HuggingFaceTB/smoltalk2
config: OpenMathReasoning_think
columns: [messages]
weight: 0.5
# ... 22 more splits with weights from 0.02 to 1.0
seed: 42
test_split_size: 0.005
CLI Launch (Multi-Node)
# Multi-task SFT on 8 nodes with DeepSpeed ZeRO-3
accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
--num_machines 8 \
scripts/sft.py \
--config recipes/smollm3/sft/sft.yaml