Implementation:Huggingface Transformers AutoModelForCausalLM From Pretrained For PEFT
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Parameter_Efficient_Fine_Tuning, NLP, Model_Loading |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for loading a pretrained causal language model that will serve as the base for PEFT adapter injection, provided by the Hugging Face Transformers library.
Description
AutoModelForCausalLM.from_pretrained is the primary entry point for loading a pretrained causal language model. In the context of PEFT, this method is used to load the base model whose weights will remain frozen while adapter parameters are trained on top.
The method performs several PEFT-relevant operations:
- Adapter auto-detection: If the provided path contains an
adapter_config.jsonfile (checked viafind_adapter_config_file), it reads the adapter configuration, extracts thebase_model_name_or_path, and transparently loads the correct base model before attaching the adapter. - Quantization support: Accepts a
quantization_configparameter (e.g.,BitsAndBytesConfig) that enables 4-bit or 8-bit quantization of the base model, which is essential for QLoRA workflows. - Device mapping: The
device_mapparameter enables automatic distribution of model layers across available hardware, critical for large models used in PEFT. - Config resolution: The method resolves the model configuration from the Hub or local path, determines the correct model class from the
_model_mapping, and delegates to the concrete class'sfrom_pretrained.
Usage
Use this method whenever you need to:
- Load a base model before calling
model.add_adapter()to inject a new adapter for training - Load a model with quantization for QLoRA training
- Reload a model from an adapter checkpoint (the method auto-detects the base model)
- Distribute a large model across GPUs with
device_map="auto"prior to adapter attachment
Code Reference
Source Location
- Repository: transformers
- File:
src/transformers/models/auto/auto_factory.py(lines 250-379)
Signature
@classmethod
def from_pretrained(
cls,
pretrained_model_name_or_path: str | os.PathLike[str],
*model_args,
**kwargs
)
Import
from transformers import AutoModelForCausalLM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pretrained_model_name_or_path | str or os.PathLike |
Yes | Hub model ID (e.g., "meta-llama/Llama-2-7b-hf"), local directory path, or path to an adapter checkpoint (auto-detected).
|
| device_map | str or dict or int or torch.device |
No | Device placement strategy. Use "auto" to distribute across available GPUs. Defaults to None (single device).
|
| quantization_config | BitsAndBytesConfig or QuantizationConfigMixin |
No | Quantization configuration for 4-bit or 8-bit loading. Required for QLoRA workflows. |
| torch_dtype | torch.dtype or "auto" |
No | Data type for model weights. Common values: torch.float16, torch.bfloat16, or "auto".
|
| trust_remote_code | bool |
No | Whether to allow loading custom model code from the Hub. Defaults to False.
|
| adapter_kwargs | dict |
No | Additional keyword arguments passed to the adapter loading mechanism (e.g., token, revision).
|
| config | PreTrainedConfig |
No | A pre-loaded configuration object. If not provided, the config is loaded from the model path. |
| token | str or bool |
No | Authentication token for private Hub models. |
Outputs
| Name | Type | Description |
|---|---|---|
| model | PreTrainedModel |
A fully initialized pretrained model instance, ready for adapter injection or inference. If an adapter checkpoint was detected, the adapter is already attached. |
Usage Examples
Basic Usage: Load Base Model for LoRA
from transformers import AutoModelForCausalLM
import torch
# Load base model in half precision
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
torch_dtype=torch.float16,
device_map="auto",
)
QLoRA: Load with 4-bit Quantization
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
quantization_config=quantization_config,
device_map="auto",
)
Load from Adapter Checkpoint (Auto-Detection)
from transformers import AutoModelForCausalLM
# If the path contains adapter_config.json, the base model is loaded automatically
model = AutoModelForCausalLM.from_pretrained(
"my-org/llama-2-7b-lora-adapter",
device_map="auto",
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment