Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LLMBook zh LLMBook zh github io AutoModelForCausalLM From Pretrained Pretraining

From Leeroopedia


Knowledge Sources
Domains NLP, Deep_Learning
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for loading pre-trained causal language models with FlashAttention-2 for pre-training provided by HuggingFace Transformers.

Description

AutoModelForCausalLM.from_pretrained automatically loads the correct model architecture and pre-trained weights based on the model name or path. In the pre-training context of this repository, it is used to load LLaMA-2 models with FlashAttention-2 enabled for efficient training.

This is a Wrapper Doc — it documents how the LLMBook repository uses the HuggingFace Transformers external API.

Usage

Use this when loading a base model for continued pre-training. Pass the model to HuggingFace Trainer along with a PTDataset.

Code Reference

Source Location

  • Repository: LLMBook-zh
  • File: code/6.2 预训练实践.py
  • Lines: 55

Signature

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path: str,
    attn_implementation: str = "flash_attention_2"
) -> PreTrainedModel

Import

from transformers import AutoModelForCausalLM

External Reference

I/O Contract

Inputs

Name Type Required Description
model_name_or_path str Yes HuggingFace model ID or local path (e.g., "meta-llama/Llama-2-7b-hf")
attn_implementation str No Attention backend (default "flash_attention_2")

Outputs

Name Type Description
return PreTrainedModel Initialized model with pre-trained weights

Usage Examples

from transformers import AutoModelForCausalLM

# Load LLaMA-2 with FlashAttention-2 for pre-training
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    attn_implementation="flash_attention_2"
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment