Implementation:Huggingface Transformers AutoModelForCausalLM From Pretrained For PEFT

Knowledge Sources	Transformers PEFT Docs Transformers Docs
Domains	Parameter_Efficient_Fine_Tuning, NLP, Model_Loading
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for loading a pretrained causal language model that will serve as the base for PEFT adapter injection, provided by the Hugging Face Transformers library.

Description

AutoModelForCausalLM.from_pretrained is the primary entry point for loading a pretrained causal language model. In the context of PEFT, this method is used to load the base model whose weights will remain frozen while adapter parameters are trained on top.

The method performs several PEFT-relevant operations:

Adapter auto-detection: If the provided path contains an adapter_config.json file (checked via find_adapter_config_file), it reads the adapter configuration, extracts the base_model_name_or_path, and transparently loads the correct base model before attaching the adapter.
Quantization support: Accepts a quantization_config parameter (e.g., BitsAndBytesConfig) that enables 4-bit or 8-bit quantization of the base model, which is essential for QLoRA workflows.
Device mapping: The device_map parameter enables automatic distribution of model layers across available hardware, critical for large models used in PEFT.
Config resolution: The method resolves the model configuration from the Hub or local path, determines the correct model class from the _model_mapping, and delegates to the concrete class's from_pretrained.

Usage

Use this method whenever you need to:

Load a base model before calling model.add_adapter() to inject a new adapter for training
Load a model with quantization for QLoRA training
Reload a model from an adapter checkpoint (the method auto-detects the base model)
Distribute a large model across GPUs with device_map="auto" prior to adapter attachment

Code Reference

Source Location

Repository: transformers
File: src/transformers/models/auto/auto_factory.py (lines 250-379)

Signature

@classmethod
def from_pretrained(
    cls,
    pretrained_model_name_or_path: str | os.PathLike[str],
    *model_args,
    **kwargs
)

Import

from transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name	Type	Required	Description
pretrained_model_name_or_path	`str` or `os.PathLike`	Yes	Hub model ID (e.g., `"meta-llama/Llama-2-7b-hf"`), local directory path, or path to an adapter checkpoint (auto-detected).
device_map	`str` or `dict` or `int` or `torch.device`	No	Device placement strategy. Use `"auto"` to distribute across available GPUs. Defaults to `None` (single device).
quantization_config	`BitsAndBytesConfig` or `QuantizationConfigMixin`	No	Quantization configuration for 4-bit or 8-bit loading. Required for QLoRA workflows.
torch_dtype	`torch.dtype` or `"auto"`	No	Data type for model weights. Common values: `torch.float16`, `torch.bfloat16`, or `"auto"`.
trust_remote_code	`bool`	No	Whether to allow loading custom model code from the Hub. Defaults to `False`.
adapter_kwargs	`dict`	No	Additional keyword arguments passed to the adapter loading mechanism (e.g., `token`, `revision`).
config	`PreTrainedConfig`	No	A pre-loaded configuration object. If not provided, the config is loaded from the model path.
token	`str` or `bool`	No	Authentication token for private Hub models.

Outputs

Name	Type	Description
model	`PreTrainedModel`	A fully initialized pretrained model instance, ready for adapter injection or inference. If an adapter checkpoint was detected, the adapter is already attached.

Usage Examples

Basic Usage: Load Base Model for LoRA

from transformers import AutoModelForCausalLM
import torch

# Load base model in half precision
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
)

QLoRA: Load with 4-bit Quantization

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=quantization_config,
    device_map="auto",
)

Load from Adapter Checkpoint (Auto-Detection)

from transformers import AutoModelForCausalLM

# If the path contains adapter_config.json, the base model is loaded automatically
model = AutoModelForCausalLM.from_pretrained(
    "my-org/llama-2-7b-lora-adapter",
    device_map="auto",
)

Related Pages

Implements Principle

Principle:Huggingface_Transformers_Base_Model_Loading_For_PEFT

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment