Principle:LLMBook zh LLMBook zh github io SFT Model Loading
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The process of loading a pre-trained causal language model as the starting point for supervised fine-tuning.
Description
SFT Model Loading loads a pre-trained base model (or a previously pre-trained checkpoint) for supervised fine-tuning. Unlike pre-training initialization, the model loaded for SFT is expected to already have general language understanding. FlashAttention-2 is enabled for training efficiency. The key difference from pre-training model loading is the intent: the model will be fine-tuned on instruction-response data rather than continued pre-training on raw text.
Usage
Use this when starting supervised fine-tuning. Load the base model that has already been pre-trained, then pass it to a Trainer with SFT-formatted data.
Theoretical Basis
SFT model loading follows the transfer learning paradigm:
- Load pre-trained weights that encode general language knowledge.
- All parameters are trainable (full fine-tuning).
- The model architecture remains unchanged from pre-training.