Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Transformers AutoModelForCausalLM From Pretrained For PEFT

From Leeroopedia
Knowledge Sources
Domains Parameter_Efficient_Fine_Tuning, NLP, Model_Loading
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete tool for loading a pretrained causal language model that will serve as the base for PEFT adapter injection, provided by the Hugging Face Transformers library.

Description

AutoModelForCausalLM.from_pretrained is the primary entry point for loading a pretrained causal language model. In the context of PEFT, this method is used to load the base model whose weights will remain frozen while adapter parameters are trained on top.

The method performs several PEFT-relevant operations:

  • Adapter auto-detection: If the provided path contains an adapter_config.json file (checked via find_adapter_config_file), it reads the adapter configuration, extracts the base_model_name_or_path, and transparently loads the correct base model before attaching the adapter.
  • Quantization support: Accepts a quantization_config parameter (e.g., BitsAndBytesConfig) that enables 4-bit or 8-bit quantization of the base model, which is essential for QLoRA workflows.
  • Device mapping: The device_map parameter enables automatic distribution of model layers across available hardware, critical for large models used in PEFT.
  • Config resolution: The method resolves the model configuration from the Hub or local path, determines the correct model class from the _model_mapping, and delegates to the concrete class's from_pretrained.

Usage

Use this method whenever you need to:

  • Load a base model before calling model.add_adapter() to inject a new adapter for training
  • Load a model with quantization for QLoRA training
  • Reload a model from an adapter checkpoint (the method auto-detects the base model)
  • Distribute a large model across GPUs with device_map="auto" prior to adapter attachment

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/models/auto/auto_factory.py (lines 250-379)

Signature

@classmethod
def from_pretrained(
    cls,
    pretrained_model_name_or_path: str | os.PathLike[str],
    *model_args,
    **kwargs
)

Import

from transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name Type Required Description
pretrained_model_name_or_path str or os.PathLike Yes Hub model ID (e.g., "meta-llama/Llama-2-7b-hf"), local directory path, or path to an adapter checkpoint (auto-detected).
device_map str or dict or int or torch.device No Device placement strategy. Use "auto" to distribute across available GPUs. Defaults to None (single device).
quantization_config BitsAndBytesConfig or QuantizationConfigMixin No Quantization configuration for 4-bit or 8-bit loading. Required for QLoRA workflows.
torch_dtype torch.dtype or "auto" No Data type for model weights. Common values: torch.float16, torch.bfloat16, or "auto".
trust_remote_code bool No Whether to allow loading custom model code from the Hub. Defaults to False.
adapter_kwargs dict No Additional keyword arguments passed to the adapter loading mechanism (e.g., token, revision).
config PreTrainedConfig No A pre-loaded configuration object. If not provided, the config is loaded from the model path.
token str or bool No Authentication token for private Hub models.

Outputs

Name Type Description
model PreTrainedModel A fully initialized pretrained model instance, ready for adapter injection or inference. If an adapter checkpoint was detected, the adapter is already attached.

Usage Examples

Basic Usage: Load Base Model for LoRA

from transformers import AutoModelForCausalLM
import torch

# Load base model in half precision
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
)

QLoRA: Load with 4-bit Quantization

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=quantization_config,
    device_map="auto",
)

Load from Adapter Checkpoint (Auto-Detection)

from transformers import AutoModelForCausalLM

# If the path contains adapter_config.json, the base model is loaded automatically
model = AutoModelForCausalLM.from_pretrained(
    "my-org/llama-2-7b-lora-adapter",
    device_map="auto",
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment