Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Transformers AutoModelForCausalLM From Pretrained For Training

From Leeroopedia
Knowledge Sources
Domains NLP, Training, Deep Learning
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete tool for loading a pretrained causal language model ready for fine-tuning, provided by the HuggingFace Transformers library.

Description

AutoModelForCausalLM.from_pretrained() is a factory class method defined in the auto_factory module that automatically resolves the correct causal language model class (e.g., LlamaForCausalLM, GPT2LMHeadModel, MistralForCausalLM) based on the model's configuration and loads its pretrained weights. The method follows a three-step dispatch pattern:

  1. Load or receive the model configuration to determine the model_type.
  2. Look up the model type in MODEL_FOR_CAUSAL_LM_MAPPING to find the concrete class.
  3. Delegate to that class's own from_pretrained() to instantiate and load weights.

When loading for training, key parameters include torch_dtype (to control precision), device_map (to distribute across GPUs), and attn_implementation (to select optimized attention kernels like Flash Attention 2 or SDPA).

Usage

Use AutoModelForCausalLM.from_pretrained() when starting a fine-tuning or continued pretraining workflow for a causal (autoregressive) language model. This should be called after setting up the tokenizer and before initializing the Trainer.

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/models/auto/auto_factory.py (lines 250-380)

Signature

@classmethod
def from_pretrained(cls, pretrained_model_name_or_path: str | os.PathLike[str], *model_args, **kwargs):

Import

from transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name Type Required Description
pretrained_model_name_or_path str or os.PathLike Yes Model ID on the HuggingFace Hub (e.g., "meta-llama/Llama-2-7b-hf") or path to a local directory containing model weights and config
*model_args positional args No Additional positional arguments passed to the underlying model class
config PreTrainedConfig No Model configuration. If not provided, loaded from pretrained_model_name_or_path
torch_dtype torch.dtype or str No Data type for model weights. Use torch.float16, torch.bfloat16, or "auto" to infer from the checkpoint
device_map str or dict No Device placement strategy: "auto" for automatic distribution, "cpu", "cuda:0", or a custom mapping dict
attn_implementation str No Attention implementation to use: "eager", "sdpa" (Scaled Dot-Product Attention), or "flash_attention_2"
trust_remote_code bool No Whether to allow custom model code from the Hub (defaults to False)
quantization_config QuantizationConfig No Configuration for model quantization (e.g., BitsAndBytesConfig for 4-bit/8-bit quantization)
cache_dir str No Directory to cache downloaded model files
revision str No Model version to use (branch, tag, or commit hash; defaults to "main")
token str or bool No Authentication token for accessing gated or private models

Outputs

Name Type Description
model PreTrainedModel An instantiated causal language model with pretrained weights loaded, ready for training or inference

Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("gpt2")

Loading for Fine-Tuning with BF16

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

Loading with 4-bit Quantization for QLoRA

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16",
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto",
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment