Implementation:Ollama Ollama Convert Llama Adapter

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF adapter converter for Llama LoRA adapters, handling tensor transposition detection, Q/K weight interleaved head reordering, and LoRA A/B tensor naming conventions.

Description

The llamaAdapter struct implements the AdapterConverter interface for converting Llama LoRA (Low-Rank Adaptation) fine-tuning adapters to GGUF format. Unlike model converters, it receives base model KV configuration (head counts) via the KV method. The Tensors method detects tensors that need transposition by comparing shape dimensions and applies either repack (head reordering only) or repackAndTranspose (head reordering plus transposition) depending on the tensor layout. Both repack functions perform interleaved head dimension reordering for Q/K attention weight LoRA tensors.

Usage

Invoked when converting LoRA adapter weights for Llama-family models. The adapter converter reads base model configuration to determine attention head counts for correct Q/K repacking.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_llama_adapter.go
Lines: 1-170

Signature

type llamaAdapter struct {
    AdapterParameters
    NumAttentionHeads uint32 `json:"num_attention_heads"`
    NumKeyValueHeads  uint32 `json:"num_key_value_heads"`
}

func (p *llamaAdapter) KV(baseKV fs.Config) KV
func (p *llamaAdapter) Tensors(ts []Tensor) []*ggml.Tensor
func (p *llamaAdapter) Replacements() []string
func (p *llamaAdapter) repack(name string, data []float32, shape []uint64) ([]float32, error)
func (p *llamaAdapter) repackAndTranspose(name string, data []float32, shape []uint64) ([]float32, error)

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
baseKV	fs.Config	Yes	Base model configuration for reading head counts
ts	[]Tensor	Yes	LoRA adapter tensors (lora_A and lora_B weights)

Outputs

Name	Type	Description
KV	KV	GGUF adapter metadata with llama architecture and head counts
[]*ggml.Tensor	slice	Converted LoRA tensors with correct shapes and repacked Q/K weights

Usage Examples

// Adapter converter for Llama LoRA
// a := &llamaAdapter{}
// kv := a.KV(baseModelConfig)
// tensors := a.Tensors(adapterTensors)
// Q/K LoRA_A weights get interleaved head reordering

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_Llama_Adapter

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment