Principle:CarperAI Trlx SFT Configuration
| Knowledge Sources | |
|---|---|
| Domains | Supervised_Learning, NLP, Configuration |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
A configuration principle that defines the hyperparameters for supervised fine-tuning of language models on text or instruction-following datasets.
Description
Supervised Fine-Tuning (SFT) is the process of training a pre-trained language model on curated text data using the standard next-token prediction objective (cross-entropy loss). In the RLHF pipeline, SFT is typically the first stage: the base model is fine-tuned on demonstration data before reward model training and RL optimization. SFT configuration is simpler than PPO or ILQL since it does not require RL-specific parameters, but generation kwargs are still needed for periodic evaluation during training.
Usage
Use SFT configuration when you want to fine-tune a language model on a dataset of text samples or instruction-response pairs. SFT is appropriate when you have high-quality demonstration data and want the model to learn to produce similar outputs. It serves as the foundation stage in RLHF pipelines and as a standalone method for instruction tuning.
Theoretical Basis
SFT minimizes the standard autoregressive cross-entropy loss:
For dialogue-format data with prompt-completion pairs, the loss is masked to only compute on completion tokens:
Key configuration concerns:
- seq_length → Maximum sequence length for truncation
- batch_size → Training batch size (affects gradient noise)
- learning_rate → Controls step size (typically 1e-5 to 1e-4)
- num_layers_unfrozen → Controls partial freezing (-1 for all layers)
- gen_kwargs → Generation parameters for periodic evaluation