Principle:Ollama Ollama Adapter Conversion

Knowledge Sources	Ollama LoRA
Domains	Model_Architecture, Fine_Tuning
Last Updated	2026-02-14 00:00 GMT

Overview

A format conversion mechanism that transforms LoRA adapter weights from SafeTensors/HuggingFace format into GGUF format compatible with the llama.cpp inference engine.

Description

Adapter Conversion bridges the gap between the HuggingFace LoRA adapter format (SafeTensors with adapter_config.json) and the GGUF binary format used by Ollama's inference engine. LoRA (Low-Rank Adaptation) adapters are small weight matrices that modify a base model's behavior without changing its original weights.

The conversion process must correctly map tensor names from HuggingFace conventions to GGUF conventions, handle architecture-specific weight layouts (e.g., Q/K head interleaving for LLaMA), and set the correct GGUF metadata keys for the adapter's rank, alpha, and target layers.

Usage

Use this principle when integrating fine-tuned LoRA adapters from training frameworks (like Unsloth, PEFT, or QLoRA) into a GGUF-based inference system. The conversion is triggered by the ADAPTER directive in a Modelfile.

Theoretical Basis

LoRA adapts a pretrained model by adding low-rank decomposition matrices:

$W^{'} = W + \frac{α}{r} \cdot B A$

Where:

$W$ is the original weight matrix
$B$ and $A$ are the low-rank adapter matrices (rank r)
$α$ is the scaling factor

The conversion maps these A/B matrices from HuggingFace tensor names to GGUF tensor names while preserving the correct shapes and data types.

Related Pages

Implemented By

Implementation:Ollama_Ollama_ConvertAdapter

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment