Principle:LLMBook zh LLMBook zh github io SFT Model Loading

Knowledge Sources	HuggingFace Transformers LLMBook-zh
Domains	NLP, Deep_Learning
Last Updated	2026-02-08 00:00 GMT

Overview

The process of loading a pre-trained causal language model as the starting point for supervised fine-tuning.

Description

SFT Model Loading loads a pre-trained base model (or a previously pre-trained checkpoint) for supervised fine-tuning. Unlike pre-training initialization, the model loaded for SFT is expected to already have general language understanding. FlashAttention-2 is enabled for training efficiency. The key difference from pre-training model loading is the intent: the model will be fine-tuned on instruction-response data rather than continued pre-training on raw text.

Usage

Use this when starting supervised fine-tuning. Load the base model that has already been pre-trained, then pass it to a Trainer with SFT-formatted data.

Theoretical Basis

SFT model loading follows the transfer learning paradigm:

Load pre-trained weights that encode general language knowledge.
All parameters are trainable (full fine-tuning).
The model architecture remains unchanged from pre-training.

Related Pages

Implemented By

Implementation:LLMBook_zh_LLMBook_zh_github_io_AutoModelForCausalLM_From_Pretrained_SFT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment