Implementation:LLMBook zh LLMBook zh github io AutoModelForCausalLM From Pretrained SFT
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for loading pre-trained models with FlashAttention-2 for supervised fine-tuning provided by HuggingFace Transformers.
Description
In the SFT context, AutoModelForCausalLM.from_pretrained loads a base model for full fine-tuning on instruction-response data. The call signature is identical to pre-training model loading, but the downstream usage involves SFTDataset and DataCollatorForSupervisedDataset.
This is a Wrapper Doc documenting how the LLMBook repository uses AutoModelForCausalLM for supervised fine-tuning specifically.
Usage
Use this to load the base model before passing it to Trainer with SFTDataset and DataCollatorForSupervisedDataset.
Code Reference
Source Location
- Repository: LLMBook-zh
- File: code/7.1 SFT实践.py
- Lines: 76
Signature
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path: str,
attn_implementation: str = "flash_attention_2"
) -> PreTrainedModel
Import
from transformers import AutoModelForCausalLM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | str | Yes | HuggingFace model ID or local path |
| attn_implementation | str | No | Attention backend ("flash_attention_2") |
Outputs
| Name | Type | Description |
|---|---|---|
| return | PreTrainedModel | Model initialized with pre-trained weights for SFT |
Usage Examples
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
attn_implementation="flash_attention_2"
)
# Model is now ready for supervised fine-tuning