Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory KTransformers Integration

From Leeroopedia
Revision as of 15:06, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Hiyouga_LLaMA_Factory_KTransformers_Integration.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Model Loading, CPU-GPU Hybrid Inference
Last Updated 2026-02-06 19:00 GMT

Overview

Integrates KTransformers for CPU/GPU hybrid model loading, enabling efficient inference and fine-tuning of very large models by offloading expert layers to CPU using optimized GGUF-based kernels.

Description

This module provides three functions for KTransformers integration. load_kt_pretrained_model loads models using KTransformers' custom model implementations for supported architectures (DeepSeek V2/V3, Qwen2Moe, Qwen3Moe, Llama, Mixtral), instantiates them on a meta device, and then applies GGUF-based kernel optimization and weight loading via optimize_and_load_gguf. get_kt_peft_model wraps a pretrained model with KTransformers-specific PEFT (Parameter-Efficient Fine-Tuning) support. load_kt_peft_model loads pre-trained LoRA adapters from either GGUF or SafeTensor formats, injecting LoRA layers and mapping adapter weight keys to model parameters. The module also configures attention implementations per-architecture and supports long-context mode for Llama models.

Usage

Use this module when training or running inference on very large MoE models (e.g., DeepSeek V3, Qwen3 MoE) that exceed single-GPU memory by leveraging CPU offloading. Enable KTransformers via the use_kt model argument and provide a kt_optimize_rule YAML configuration file specifying the optimization rules.

Code Reference

Source Location

Signature

def _get_kt_kwargs(
    config: "PretrainedConfig",
    model_name_or_path: str,
    model_args: "ModelArguments",
    finetuning_args: "FinetuningArguments",
) -> dict[str, Any]:
    ...

def load_kt_pretrained_model(
    config: "PretrainedConfig",
    model_args: "ModelArguments",
) -> "PreTrainedModel":
    ...

def get_kt_peft_model(
    model: "PreTrainedModel",
    peft_kwargs: dict[str, Any],
) -> "PreTrainedModel":
    ...

def load_kt_peft_model(
    model_args: "ModelArguments",
    model: "PreTrainedModel",
) -> "PreTrainedModel":
    ...

Import

from llamafactory.model.model_utils.ktransformers import load_kt_pretrained_model, get_kt_peft_model, load_kt_peft_model

I/O Contract

Inputs

Name Type Required Description
config PretrainedConfig Yes Model configuration object
model_args ModelArguments Yes Model arguments including model_name_or_path, kt_optimize_rule, cpu_infer, chunk_size, mode, and trust_remote_code
finetuning_args FinetuningArguments Yes (for _get_kt_kwargs) Finetuning arguments including finetuning_type
model PreTrainedModel Yes (for get_kt_peft_model, load_kt_peft_model) Previously loaded pretrained model
peft_kwargs dict[str, Any] Yes (for get_kt_peft_model) PEFT configuration dictionary for LoRA setup

Outputs

Name Type Description
model PreTrainedModel KTransformers-optimized model with GGUF kernels loaded and ready for training or inference

Usage Examples

from llamafactory.model.model_utils.ktransformers import load_kt_pretrained_model

# Load a DeepSeek V3 model with KTransformers CPU/GPU hybrid
# Requires: model_args.use_kt = True, model_args.kt_optimize_rule = "path/to/rules.yaml"
model = load_kt_pretrained_model(config, model_args)

# Load an existing LoRA adapter for KTransformers model
from llamafactory.model.model_utils.ktransformers import load_kt_peft_model

model = load_kt_peft_model(model_args, model)

# Get a new PEFT model for KTransformers training
from llamafactory.model.model_utils.ktransformers import get_kt_peft_model

peft_model = get_kt_peft_model(model, peft_kwargs)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment