Implementation:Hiyouga LLaMA Factory Model Loader
| Knowledge Sources | |
|---|---|
| Domains | Model Loading, LLM Training |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Central orchestrator for loading pretrained models, tokenizers, and configurations, supporting multiple backends including HuggingFace, Unsloth, KTransformers, and Mixture-of-Depths.
Description
This module provides three primary functions that form the model loading pipeline. load_tokenizer loads a pretrained tokenizer and optional processor from HuggingFace Hub (or alternative hubs), applying framework-specific patches. load_config retrieves the model configuration. load_model orchestrates the full loading pipeline: it patches the configuration, applies Liger Kernel optimizations, selects the appropriate loading path (KTransformers, Unsloth, Mixture-of-Depths, or standard AutoModel), initializes adapters, optionally wraps with a value head for RLHF, and finalizes with parameter statistics logging. The module also defines TokenizerModule as a typed dictionary containing the tokenizer and optional processor.
Usage
Use load_tokenizer and load_model as the primary entry points whenever a model needs to be loaded for training, inference, or export. These functions handle all configuration combinations including quantized models, multimodal models (image-text, audio-text), and adapter loading.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/model/loader.py
- Lines: 1-247
Signature
class TokenizerModule(TypedDict):
tokenizer: "PreTrainedTokenizer"
processor: Optional["ProcessorMixin"]
def _get_init_kwargs(model_args: "ModelArguments") -> dict[str, Any]:
...
def load_tokenizer(model_args: "ModelArguments") -> "TokenizerModule":
...
def load_config(model_args: "ModelArguments") -> "PretrainedConfig":
...
def load_model(
tokenizer: "PreTrainedTokenizer",
model_args: "ModelArguments",
finetuning_args: "FinetuningArguments",
is_trainable: bool = False,
add_valuehead: bool = False,
) -> "PreTrainedModel":
...
Import
from llamafactory.model.loader import load_tokenizer, load_config, load_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_args | ModelArguments | Yes | Model configuration including model path, trust_remote_code, quantization settings, etc. |
| finetuning_args | FinetuningArguments | Yes (for load_model) | Finetuning configuration including adapter type, training stage, etc. |
| tokenizer | PreTrainedTokenizer | Yes (for load_model) | Previously loaded tokenizer instance |
| is_trainable | bool | No (default: False) | Whether the model will be used for training (enables gradient computation) |
| add_valuehead | bool | No (default: False) | Whether to wrap the model with a value head for RLHF training |
Outputs
| Name | Type | Description |
|---|---|---|
| TokenizerModule | TypedDict | Dictionary containing tokenizer (PreTrainedTokenizer) and processor (Optional ProcessorMixin) |
| PretrainedConfig | PretrainedConfig | Model configuration object (from load_config) |
| PreTrainedModel | PreTrainedModel | Fully initialized and patched model ready for training or inference |
Usage Examples
from llamafactory.model.loader import load_tokenizer, load_model
# Load tokenizer
tokenizer_module = load_tokenizer(model_args)
tokenizer = tokenizer_module["tokenizer"]
processor = tokenizer_module["processor"]
# Load model for training
model = load_model(
tokenizer=tokenizer,
model_args=model_args,
finetuning_args=finetuning_args,
is_trainable=True,
)
# Load model for inference
model = load_model(
tokenizer=tokenizer,
model_args=model_args,
finetuning_args=finetuning_args,
is_trainable=False,
)
# Load model with value head for PPO/RLHF
model = load_model(
tokenizer=tokenizer,
model_args=model_args,
finetuning_args=finetuning_args,
is_trainable=True,
add_valuehead=True,
)
Related Pages
- Hiyouga_LLaMA_Factory_Attention_Config - Attention implementation configuration applied during model loading
- Hiyouga_LLaMA_Factory_Gradient_Checkpointing - Gradient checkpointing setup applied after model loading
- Hiyouga_LLaMA_Factory_Embedding_Resize - Embedding resizing applied during model patching
- Hiyouga_LLaMA_Factory_KTransformers_Integration - KTransformers backend used as an alternative loading path
- Hiyouga_LLaMA_Factory_Liger_Kernel - Liger Kernel applied during model loading
- Hiyouga_LLaMA_Factory_MoE_Config - MoE configuration applied during model loading