Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory Model Loader

From Leeroopedia


Knowledge Sources
Domains Model Loading, LLM Training
Last Updated 2026-02-06 19:00 GMT

Overview

Central orchestrator for loading pretrained models, tokenizers, and configurations, supporting multiple backends including HuggingFace, Unsloth, KTransformers, and Mixture-of-Depths.

Description

This module provides three primary functions that form the model loading pipeline. load_tokenizer loads a pretrained tokenizer and optional processor from HuggingFace Hub (or alternative hubs), applying framework-specific patches. load_config retrieves the model configuration. load_model orchestrates the full loading pipeline: it patches the configuration, applies Liger Kernel optimizations, selects the appropriate loading path (KTransformers, Unsloth, Mixture-of-Depths, or standard AutoModel), initializes adapters, optionally wraps with a value head for RLHF, and finalizes with parameter statistics logging. The module also defines TokenizerModule as a typed dictionary containing the tokenizer and optional processor.

Usage

Use load_tokenizer and load_model as the primary entry points whenever a model needs to be loaded for training, inference, or export. These functions handle all configuration combinations including quantized models, multimodal models (image-text, audio-text), and adapter loading.

Code Reference

Source Location

Signature

class TokenizerModule(TypedDict):
    tokenizer: "PreTrainedTokenizer"
    processor: Optional["ProcessorMixin"]

def _get_init_kwargs(model_args: "ModelArguments") -> dict[str, Any]:
    ...

def load_tokenizer(model_args: "ModelArguments") -> "TokenizerModule":
    ...

def load_config(model_args: "ModelArguments") -> "PretrainedConfig":
    ...

def load_model(
    tokenizer: "PreTrainedTokenizer",
    model_args: "ModelArguments",
    finetuning_args: "FinetuningArguments",
    is_trainable: bool = False,
    add_valuehead: bool = False,
) -> "PreTrainedModel":
    ...

Import

from llamafactory.model.loader import load_tokenizer, load_config, load_model

I/O Contract

Inputs

Name Type Required Description
model_args ModelArguments Yes Model configuration including model path, trust_remote_code, quantization settings, etc.
finetuning_args FinetuningArguments Yes (for load_model) Finetuning configuration including adapter type, training stage, etc.
tokenizer PreTrainedTokenizer Yes (for load_model) Previously loaded tokenizer instance
is_trainable bool No (default: False) Whether the model will be used for training (enables gradient computation)
add_valuehead bool No (default: False) Whether to wrap the model with a value head for RLHF training

Outputs

Name Type Description
TokenizerModule TypedDict Dictionary containing tokenizer (PreTrainedTokenizer) and processor (Optional ProcessorMixin)
PretrainedConfig PretrainedConfig Model configuration object (from load_config)
PreTrainedModel PreTrainedModel Fully initialized and patched model ready for training or inference

Usage Examples

from llamafactory.model.loader import load_tokenizer, load_model

# Load tokenizer
tokenizer_module = load_tokenizer(model_args)
tokenizer = tokenizer_module["tokenizer"]
processor = tokenizer_module["processor"]

# Load model for training
model = load_model(
    tokenizer=tokenizer,
    model_args=model_args,
    finetuning_args=finetuning_args,
    is_trainable=True,
)

# Load model for inference
model = load_model(
    tokenizer=tokenizer,
    model_args=model_args,
    finetuning_args=finetuning_args,
    is_trainable=False,
)

# Load model with value head for PPO/RLHF
model = load_model(
    tokenizer=tokenizer,
    model_args=model_args,
    finetuning_args=finetuning_args,
    is_trainable=True,
    add_valuehead=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment