Implementation:Hiyouga LLaMA Factory Model Loader

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Model Loading, LLM Training
Last Updated	2026-02-06 19:00 GMT

Overview

Central orchestrator for loading pretrained models, tokenizers, and configurations, supporting multiple backends including HuggingFace, Unsloth, KTransformers, and Mixture-of-Depths.

Description

This module provides three primary functions that form the model loading pipeline. load_tokenizer loads a pretrained tokenizer and optional processor from HuggingFace Hub (or alternative hubs), applying framework-specific patches. load_config retrieves the model configuration. load_model orchestrates the full loading pipeline: it patches the configuration, applies Liger Kernel optimizations, selects the appropriate loading path (KTransformers, Unsloth, Mixture-of-Depths, or standard AutoModel), initializes adapters, optionally wraps with a value head for RLHF, and finalizes with parameter statistics logging. The module also defines TokenizerModule as a typed dictionary containing the tokenizer and optional processor.

Usage

Use load_tokenizer and load_model as the primary entry points whenever a model needs to be loaded for training, inference, or export. These functions handle all configuration combinations including quantized models, multimodal models (image-text, audio-text), and adapter loading.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/model/loader.py
Lines: 1-247

Signature

class TokenizerModule(TypedDict):
    tokenizer: "PreTrainedTokenizer"
    processor: Optional["ProcessorMixin"]

def _get_init_kwargs(model_args: "ModelArguments") -> dict[str, Any]:
    ...

def load_tokenizer(model_args: "ModelArguments") -> "TokenizerModule":
    ...

def load_config(model_args: "ModelArguments") -> "PretrainedConfig":
    ...

def load_model(
    tokenizer: "PreTrainedTokenizer",
    model_args: "ModelArguments",
    finetuning_args: "FinetuningArguments",
    is_trainable: bool = False,
    add_valuehead: bool = False,
) -> "PreTrainedModel":
    ...

Import

from llamafactory.model.loader import load_tokenizer, load_config, load_model

I/O Contract

Inputs

Name	Type	Required	Description
model_args	ModelArguments	Yes	Model configuration including model path, trust_remote_code, quantization settings, etc.
finetuning_args	FinetuningArguments	Yes (for load_model)	Finetuning configuration including adapter type, training stage, etc.
tokenizer	PreTrainedTokenizer	Yes (for load_model)	Previously loaded tokenizer instance
is_trainable	bool	No (default: False)	Whether the model will be used for training (enables gradient computation)
add_valuehead	bool	No (default: False)	Whether to wrap the model with a value head for RLHF training

Outputs

Name	Type	Description
TokenizerModule	TypedDict	Dictionary containing tokenizer (PreTrainedTokenizer) and processor (Optional ProcessorMixin)
PretrainedConfig	PretrainedConfig	Model configuration object (from load_config)
PreTrainedModel	PreTrainedModel	Fully initialized and patched model ready for training or inference

Usage Examples

from llamafactory.model.loader import load_tokenizer, load_model

# Load tokenizer
tokenizer_module = load_tokenizer(model_args)
tokenizer = tokenizer_module["tokenizer"]
processor = tokenizer_module["processor"]

# Load model for training
model = load_model(
    tokenizer=tokenizer,
    model_args=model_args,
    finetuning_args=finetuning_args,
    is_trainable=True,
)

# Load model for inference
model = load_model(
    tokenizer=tokenizer,
    model_args=model_args,
    finetuning_args=finetuning_args,
    is_trainable=False,
)

# Load model with value head for PPO/RLHF
model = load_model(
    tokenizer=tokenizer,
    model_args=model_args,
    finetuning_args=finetuning_args,
    is_trainable=True,
    add_valuehead=True,
)

Related Pages

Hiyouga_LLaMA_Factory_Attention_Config - Attention implementation configuration applied during model loading
Hiyouga_LLaMA_Factory_Gradient_Checkpointing - Gradient checkpointing setup applied after model loading
Hiyouga_LLaMA_Factory_Embedding_Resize - Embedding resizing applied during model patching
Hiyouga_LLaMA_Factory_KTransformers_Integration - KTransformers backend used as an alternative loading path
Hiyouga_LLaMA_Factory_Liger_Kernel - Liger Kernel applied during model loading
Hiyouga_LLaMA_Factory_MoE_Config - MoE configuration applied during model loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment