Implementation:Axolotl ai cloud Axolotl ModelLoader Load
| Knowledge Sources | |
|---|---|
| Domains | Model_Loading, Quantization |
| Last Updated | 2026-02-06 23:00 GMT |
Overview
Concrete tool for loading pre-trained language models with optional quantization provided by the Axolotl framework.
Description
The ModelLoader class handles the complete model loading pipeline in Axolotl. It configures quantization (4-bit NF4, 8-bit INT8, GPTQ), sets up device mapping for multi-GPU, applies model-specific patches (flash attention, RoPE scaling), and instantiates the model via HuggingFace AutoModelForCausalLM. The load() method orchestrates the full pipeline and returns the model with an optional PeftConfig.
Key responsibilities include:
- Configuring BitsAndBytesConfig for quantized loading
- Setting up device maps for model parallelism
- Applying monkey patches for optimized training
- Handling model architecture-specific quirks (embedding resizing, dtype fixes)
Usage
Use this implementation when loading a causal language model for QLoRA/LoRA fine-tuning. The ModelLoader handles all quantization configuration automatically based on the YAML config.
Code Reference
Source Location
- Repository: axolotl
- File: src/axolotl/loaders/model.py
- Lines: L67-883 (class), L98-144 (init), L162-191 (load method), L515-597 (quantization config), L698-815 (build model)
Signature
class ModelLoader:
"""Load pretrained models with quantization and patching support."""
def __init__(
self,
cfg: DictDefault,
tokenizer: PreTrainedTokenizerBase,
*,
inference: bool = False,
reference_model: bool = False,
**kwargs,
):
"""
Args:
cfg: Training configuration dictionary.
tokenizer: Pre-loaded tokenizer instance.
inference: Whether loading for inference (disables training optimizations).
reference_model: Whether loading as DPO reference model.
**kwargs: Additional keyword arguments.
"""
def load(self) -> tuple[PreTrainedModel | PeftModelForCausalLM, PeftConfig | None]:
"""Load and configure the model.
Returns:
Tuple of (model instance with quantization applied, PeftConfig or None).
"""
Import
from axolotl.loaders.model import ModelLoader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| cfg | DictDefault | Yes | Config with base_model, load_in_4bit, load_in_8bit, quant_type, bf16/fp16, flash_attention, device_map, etc. |
| tokenizer | PreTrainedTokenizerBase | Yes | Pre-loaded tokenizer for embedding resizing |
| inference | bool | No (default: False) | Load for inference only (skip training optimizations) |
| reference_model | bool | No (default: False) | Load as DPO reference model |
Outputs
| Name | Type | Description |
|---|---|---|
| model | PreTrainedModel or PeftModelForCausalLM | Loaded model with quantization and patches applied |
| peft_config | PeftConfig or None | PEFT configuration if adapter was loaded from checkpoint, None otherwise |
Usage Examples
Loading a QLoRA Model
from axolotl.loaders.model import ModelLoader
from axolotl.loaders.tokenizer import load_tokenizer
# Config specifies 4-bit quantization
# cfg.base_model = "meta-llama/Llama-3.2-1B"
# cfg.load_in_4bit = True
# cfg.quant_type = "nf4"
tokenizer = load_tokenizer(cfg)
loader = ModelLoader(cfg, tokenizer)
model, peft_config = loader.load()
print(model.dtype) # torch.float16 (compute dtype)
print(model.config.quantization_config) # BitsAndBytesConfig
Loading for Inference
loader = ModelLoader(cfg, tokenizer, inference=True)
model, _ = loader.load()