Implementation:Huggingface Transformers AutoModelForCausalLM From Pretrained For Training

Knowledge Sources	Transformers Transformers Docs
Domains	NLP, Training, Deep Learning
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for loading a pretrained causal language model ready for fine-tuning, provided by the HuggingFace Transformers library.

Description

AutoModelForCausalLM.from_pretrained() is a factory class method defined in the auto_factory module that automatically resolves the correct causal language model class (e.g., LlamaForCausalLM, GPT2LMHeadModel, MistralForCausalLM) based on the model's configuration and loads its pretrained weights. The method follows a three-step dispatch pattern:

Load or receive the model configuration to determine the model_type.
Look up the model type in MODEL_FOR_CAUSAL_LM_MAPPING to find the concrete class.
Delegate to that class's own from_pretrained() to instantiate and load weights.

When loading for training, key parameters include torch_dtype (to control precision), device_map (to distribute across GPUs), and attn_implementation (to select optimized attention kernels like Flash Attention 2 or SDPA).

Usage

Use AutoModelForCausalLM.from_pretrained() when starting a fine-tuning or continued pretraining workflow for a causal (autoregressive) language model. This should be called after setting up the tokenizer and before initializing the Trainer.

Code Reference

Source Location

Repository: transformers
File: src/transformers/models/auto/auto_factory.py (lines 250-380)

Signature

@classmethod
def from_pretrained(cls, pretrained_model_name_or_path: str | os.PathLike[str], *model_args, **kwargs):

Import

from transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name	Type	Required	Description
pretrained_model_name_or_path	str or os.PathLike	Yes	Model ID on the HuggingFace Hub (e.g., "meta-llama/Llama-2-7b-hf") or path to a local directory containing model weights and config
*model_args	positional args	No	Additional positional arguments passed to the underlying model class
config	PreTrainedConfig	No	Model configuration. If not provided, loaded from pretrained_model_name_or_path
torch_dtype	torch.dtype or str	No	Data type for model weights. Use torch.float16, torch.bfloat16, or "auto" to infer from the checkpoint
device_map	str or dict	No	Device placement strategy: "auto" for automatic distribution, "cpu", "cuda:0", or a custom mapping dict
attn_implementation	str	No	Attention implementation to use: "eager", "sdpa" (Scaled Dot-Product Attention), or "flash_attention_2"
trust_remote_code	bool	No	Whether to allow custom model code from the Hub (defaults to False)
quantization_config	QuantizationConfig	No	Configuration for model quantization (e.g., BitsAndBytesConfig for 4-bit/8-bit quantization)
cache_dir	str	No	Directory to cache downloaded model files
revision	str	No	Model version to use (branch, tag, or commit hash; defaults to "main")
token	str or bool	No	Authentication token for accessing gated or private models

Outputs

Name	Type	Description
model	PreTrainedModel	An instantiated causal language model with pretrained weights loaded, ready for training or inference

Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("gpt2")

Loading for Fine-Tuning with BF16

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

Loading with 4-bit Quantization for QLoRA

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16",
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto",
)

Related Pages

Implements Principle

Principle:Huggingface_Transformers_Model_Loading_For_Training

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment