Principle:Ollama Ollama Model Adaptation

Knowledge Sources	Ollama LoRA: Low-Rank Adaptation of Large Language Models
Domains	Fine-Tuning, Model Adaptation
Last Updated	2025-02-15 00:00 GMT

Overview

Model Adaptation enables the integration of LoRA (Low-Rank Adaptation) and other adapter-based fine-tuning methods into the Ollama inference pipeline, allowing users to apply lightweight parameter modifications on top of base models without duplicating the full model weights.

Core Concepts

Low-Rank Adaptation (LoRA)

LoRA decomposes weight updates into two low-rank matrices A and B such that the adapted weight is W' = W + alpha * (B @ A), where W is the original pretrained weight, and A and B are much smaller matrices whose product approximates the fine-tuning delta. This dramatically reduces the number of trainable parameters and storage requirements while preserving the ability to specialize model behavior for specific tasks or domains.

Adapter Stacking

Multiple adapters can be conceptually stacked on a single base model. During inference, the adapter weights are merged with the base model weights either at load time (weight merging) or applied dynamically during the forward pass. Weight merging is simpler and has no runtime overhead, while dynamic application allows switching adapters without reloading the base model.

Adapter Conversion

Adapters trained in external frameworks (e.g., HuggingFace PEFT) must be converted to Ollama's internal format. The conversion process maps adapter tensor names to the base model's tensor naming convention, verifies rank and dimension compatibility, and packages the adapter weights alongside the necessary metadata for correct application during inference.

Scale Factor

The adapter's effect is controlled by a scale factor (alpha / rank) that modulates the magnitude of the low-rank update. This allows users to tune the strength of the adaptation without retraining, providing a continuous spectrum between the base model's behavior and the fully adapted behavior.

Implementation Notes

Adapter support in the Ollama codebase spans conversion and runtime. The converter for Llama-family adapters is in convert/convert_llama_adapter.go and convert/convert_gemma2_adapter.go. At runtime, adapter weights are loaded alongside the base model and merged into the model's weight tensors during the loading phase. The Modelfile ADAPTER directive specifies the adapter to apply, and the creation pipeline handles packaging both base model and adapter references into the resulting manifest.

Related Pages

Implementation:Ollama_Ollama_Llama_Adapter

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment