Implementation:LLMBook zh LLMBook zh github io AutoModelForCausalLM From Pretrained SFT

Knowledge Sources	LLMBook-zh HuggingFace AutoModelForCausalLM
Domains	NLP, Deep_Learning
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for loading pre-trained models with FlashAttention-2 for supervised fine-tuning provided by HuggingFace Transformers.

Description

In the SFT context, AutoModelForCausalLM.from_pretrained loads a base model for full fine-tuning on instruction-response data. The call signature is identical to pre-training model loading, but the downstream usage involves SFTDataset and DataCollatorForSupervisedDataset.

This is a Wrapper Doc documenting how the LLMBook repository uses AutoModelForCausalLM for supervised fine-tuning specifically.

Usage

Use this to load the base model before passing it to Trainer with SFTDataset and DataCollatorForSupervisedDataset.

Code Reference

Source Location

Repository: LLMBook-zh
File: code/7.1 SFT实践.py
Lines: 76

Signature

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path: str,
    attn_implementation: str = "flash_attention_2"
) -> PreTrainedModel

Import

from transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name	Type	Required	Description
model_name_or_path	str	Yes	HuggingFace model ID or local path
attn_implementation	str	No	Attention backend ("flash_attention_2")

Outputs

Name	Type	Description
return	PreTrainedModel	Model initialized with pre-trained weights for SFT

Usage Examples

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    attn_implementation="flash_attention_2"
)
# Model is now ready for supervised fine-tuning

Related Pages

Implements Principle

Principle:LLMBook_zh_LLMBook_zh_github_io_SFT_Model_Loading

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment