Implementation:Hiyouga LLaMA Factory KTransformers Integration
| Knowledge Sources | |
|---|---|
| Domains | Model Loading, CPU-GPU Hybrid Inference |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Integrates KTransformers for CPU/GPU hybrid model loading, enabling efficient inference and fine-tuning of very large models by offloading expert layers to CPU using optimized GGUF-based kernels.
Description
This module provides three functions for KTransformers integration. load_kt_pretrained_model loads models using KTransformers' custom model implementations for supported architectures (DeepSeek V2/V3, Qwen2Moe, Qwen3Moe, Llama, Mixtral), instantiates them on a meta device, and then applies GGUF-based kernel optimization and weight loading via optimize_and_load_gguf. get_kt_peft_model wraps a pretrained model with KTransformers-specific PEFT (Parameter-Efficient Fine-Tuning) support. load_kt_peft_model loads pre-trained LoRA adapters from either GGUF or SafeTensor formats, injecting LoRA layers and mapping adapter weight keys to model parameters. The module also configures attention implementations per-architecture and supports long-context mode for Llama models.
Usage
Use this module when training or running inference on very large MoE models (e.g., DeepSeek V3, Qwen3 MoE) that exceed single-GPU memory by leveraging CPU offloading. Enable KTransformers via the use_kt model argument and provide a kt_optimize_rule YAML configuration file specifying the optimization rules.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/model/model_utils/ktransformers.py
- Lines: 1-154
Signature
def _get_kt_kwargs(
config: "PretrainedConfig",
model_name_or_path: str,
model_args: "ModelArguments",
finetuning_args: "FinetuningArguments",
) -> dict[str, Any]:
...
def load_kt_pretrained_model(
config: "PretrainedConfig",
model_args: "ModelArguments",
) -> "PreTrainedModel":
...
def get_kt_peft_model(
model: "PreTrainedModel",
peft_kwargs: dict[str, Any],
) -> "PreTrainedModel":
...
def load_kt_peft_model(
model_args: "ModelArguments",
model: "PreTrainedModel",
) -> "PreTrainedModel":
...
Import
from llamafactory.model.model_utils.ktransformers import load_kt_pretrained_model, get_kt_peft_model, load_kt_peft_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | PretrainedConfig | Yes | Model configuration object |
| model_args | ModelArguments | Yes | Model arguments including model_name_or_path, kt_optimize_rule, cpu_infer, chunk_size, mode, and trust_remote_code |
| finetuning_args | FinetuningArguments | Yes (for _get_kt_kwargs) | Finetuning arguments including finetuning_type |
| model | PreTrainedModel | Yes (for get_kt_peft_model, load_kt_peft_model) | Previously loaded pretrained model |
| peft_kwargs | dict[str, Any] | Yes (for get_kt_peft_model) | PEFT configuration dictionary for LoRA setup |
Outputs
| Name | Type | Description |
|---|---|---|
| model | PreTrainedModel | KTransformers-optimized model with GGUF kernels loaded and ready for training or inference |
Usage Examples
from llamafactory.model.model_utils.ktransformers import load_kt_pretrained_model
# Load a DeepSeek V3 model with KTransformers CPU/GPU hybrid
# Requires: model_args.use_kt = True, model_args.kt_optimize_rule = "path/to/rules.yaml"
model = load_kt_pretrained_model(config, model_args)
# Load an existing LoRA adapter for KTransformers model
from llamafactory.model.model_utils.ktransformers import load_kt_peft_model
model = load_kt_peft_model(model_args, model)
# Get a new PEFT model for KTransformers training
from llamafactory.model.model_utils.ktransformers import get_kt_peft_model
peft_model = get_kt_peft_model(model, peft_kwargs)
Related Pages
- Hiyouga_LLaMA_Factory_Model_Loader - Main model loader that delegates to KTransformers when use_kt is enabled
- Hiyouga_LLaMA_Factory_MoE_Config - MoE configuration for the MoE models commonly used with KTransformers
- Hiyouga_LLaMA_Factory_Attention_Config - Attention configuration applied alongside KTransformers loading