Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LLMBook zh LLMBook zh github io SFT Model Loading

From Leeroopedia


Knowledge Sources
Domains NLP, Deep_Learning
Last Updated 2026-02-08 00:00 GMT

Overview

The process of loading a pre-trained causal language model as the starting point for supervised fine-tuning.

Description

SFT Model Loading loads a pre-trained base model (or a previously pre-trained checkpoint) for supervised fine-tuning. Unlike pre-training initialization, the model loaded for SFT is expected to already have general language understanding. FlashAttention-2 is enabled for training efficiency. The key difference from pre-training model loading is the intent: the model will be fine-tuned on instruction-response data rather than continued pre-training on raw text.

Usage

Use this when starting supervised fine-tuning. Load the base model that has already been pre-trained, then pass it to a Trainer with SFT-formatted data.

Theoretical Basis

SFT model loading follows the transfer learning paradigm:

  1. Load pre-trained weights that encode general language knowledge.
  2. All parameters are trainable (full fine-tuning).
  3. The model architecture remains unchanged from pre-training.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment