Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Ollama Ollama Adapter Conversion

From Leeroopedia
Knowledge Sources
Domains Model_Architecture, Fine_Tuning
Last Updated 2026-02-14 00:00 GMT

Overview

A format conversion mechanism that transforms LoRA adapter weights from SafeTensors/HuggingFace format into GGUF format compatible with the llama.cpp inference engine.

Description

Adapter Conversion bridges the gap between the HuggingFace LoRA adapter format (SafeTensors with adapter_config.json) and the GGUF binary format used by Ollama's inference engine. LoRA (Low-Rank Adaptation) adapters are small weight matrices that modify a base model's behavior without changing its original weights.

The conversion process must correctly map tensor names from HuggingFace conventions to GGUF conventions, handle architecture-specific weight layouts (e.g., Q/K head interleaving for LLaMA), and set the correct GGUF metadata keys for the adapter's rank, alpha, and target layers.

Usage

Use this principle when integrating fine-tuned LoRA adapters from training frameworks (like Unsloth, PEFT, or QLoRA) into a GGUF-based inference system. The conversion is triggered by the ADAPTER directive in a Modelfile.

Theoretical Basis

LoRA adapts a pretrained model by adding low-rank decomposition matrices:

W=W+αrBA

Where:

  • W is the original weight matrix
  • B and A are the low-rank adapter matrices (rank r)
  • α is the scaling factor

The conversion maps these A/B matrices from HuggingFace tensor names to GGUF tensor names while preserving the correct shapes and data types.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment