Principle:Ollama Ollama Adapter Conversion
| Knowledge Sources | |
|---|---|
| Domains | Model_Architecture, Fine_Tuning |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A format conversion mechanism that transforms LoRA adapter weights from SafeTensors/HuggingFace format into GGUF format compatible with the llama.cpp inference engine.
Description
Adapter Conversion bridges the gap between the HuggingFace LoRA adapter format (SafeTensors with adapter_config.json) and the GGUF binary format used by Ollama's inference engine. LoRA (Low-Rank Adaptation) adapters are small weight matrices that modify a base model's behavior without changing its original weights.
The conversion process must correctly map tensor names from HuggingFace conventions to GGUF conventions, handle architecture-specific weight layouts (e.g., Q/K head interleaving for LLaMA), and set the correct GGUF metadata keys for the adapter's rank, alpha, and target layers.
Usage
Use this principle when integrating fine-tuned LoRA adapters from training frameworks (like Unsloth, PEFT, or QLoRA) into a GGUF-based inference system. The conversion is triggered by the ADAPTER directive in a Modelfile.
Theoretical Basis
LoRA adapts a pretrained model by adding low-rank decomposition matrices:
Where:
- is the original weight matrix
- and are the low-rank adapter matrices (rank r)
- is the scaling factor
The conversion maps these A/B matrices from HuggingFace tensor names to GGUF tensor names while preserving the correct shapes and data types.