Implementation:Huggingface Transformers AutoModelForCausalLM From Pretrained For Training
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training, Deep Learning |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for loading a pretrained causal language model ready for fine-tuning, provided by the HuggingFace Transformers library.
Description
AutoModelForCausalLM.from_pretrained() is a factory class method defined in the auto_factory module that automatically resolves the correct causal language model class (e.g., LlamaForCausalLM, GPT2LMHeadModel, MistralForCausalLM) based on the model's configuration and loads its pretrained weights. The method follows a three-step dispatch pattern:
- Load or receive the model configuration to determine the model_type.
- Look up the model type in MODEL_FOR_CAUSAL_LM_MAPPING to find the concrete class.
- Delegate to that class's own from_pretrained() to instantiate and load weights.
When loading for training, key parameters include torch_dtype (to control precision), device_map (to distribute across GPUs), and attn_implementation (to select optimized attention kernels like Flash Attention 2 or SDPA).
Usage
Use AutoModelForCausalLM.from_pretrained() when starting a fine-tuning or continued pretraining workflow for a causal (autoregressive) language model. This should be called after setting up the tokenizer and before initializing the Trainer.
Code Reference
Source Location
- Repository: transformers
- File: src/transformers/models/auto/auto_factory.py (lines 250-380)
Signature
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path: str | os.PathLike[str], *model_args, **kwargs):
Import
from transformers import AutoModelForCausalLM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pretrained_model_name_or_path | str or os.PathLike | Yes | Model ID on the HuggingFace Hub (e.g., "meta-llama/Llama-2-7b-hf") or path to a local directory containing model weights and config |
| *model_args | positional args | No | Additional positional arguments passed to the underlying model class |
| config | PreTrainedConfig | No | Model configuration. If not provided, loaded from pretrained_model_name_or_path |
| torch_dtype | torch.dtype or str | No | Data type for model weights. Use torch.float16, torch.bfloat16, or "auto" to infer from the checkpoint |
| device_map | str or dict | No | Device placement strategy: "auto" for automatic distribution, "cpu", "cuda:0", or a custom mapping dict |
| attn_implementation | str | No | Attention implementation to use: "eager", "sdpa" (Scaled Dot-Product Attention), or "flash_attention_2" |
| trust_remote_code | bool | No | Whether to allow custom model code from the Hub (defaults to False) |
| quantization_config | QuantizationConfig | No | Configuration for model quantization (e.g., BitsAndBytesConfig for 4-bit/8-bit quantization) |
| cache_dir | str | No | Directory to cache downloaded model files |
| revision | str | No | Model version to use (branch, tag, or commit hash; defaults to "main") |
| token | str or bool | No | Authentication token for accessing gated or private models |
Outputs
| Name | Type | Description |
|---|---|---|
| model | PreTrainedModel | An instantiated causal language model with pretrained weights loaded, ready for training or inference |
Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("gpt2")
Loading for Fine-Tuning with BF16
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="flash_attention_2",
)
Loading with 4-bit Quantization for QLoRA
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="bfloat16",
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
quantization_config=bnb_config,
device_map="auto",
)