Implementation:Ollama Ollama ML Backend
| Knowledge Sources | |
|---|---|
| Domains | ML Abstraction, Tensor Operations |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Defines the core machine learning backend abstraction layer for Ollama, providing interfaces for tensor computation, model loading, and inference execution that decouple model logic from specific hardware implementations.
Description
The ml/backend.go file declares three foundational interfaces: Backend (model loading, configuration, tensor access, context creation, device enumeration), Context (tensor creation, graph computation, memory reservation, layer-specific contexts), and Tensor (arithmetic, matrix multiplication, attention, normalization, reshaping, and many other tensor operations). It uses a plugin registry pattern where backends register via RegisterBackend and are instantiated via NewBackend. BackendParams configures GPU layer offloading, thread count, and flash attention. CacheConfig controls cache padding, V tensor permutation, and mask dtype optimizations. The file also includes Dump utilities for debugging tensor values.
Usage
All model implementations program against these interfaces, enabling the same model code to run on different hardware backends (CPU, CUDA, Metal, Vulkan, etc.) via the GGML backend implementation. The NewBackend function is called when loading a model.
Code Reference
Source Location
- Repository: Ollama
- File: ml/backend.go
- Lines: 1-412
Signature
type Backend interface {
Config() config
Get(name string) ml.Tensor
NewContext() ml.Context
Devices() []Device
// ...
}
type Context interface {
Zeros(dtype DType, shape ...int) Tensor
FromFloatSlice(s []float32, shape ...int) (Tensor, error)
Forward(tensors ...Tensor) Context
Compute(tensors ...Tensor)
MaxGraphNodes() int
Close()
}
type Tensor interface {
Dim(n int) int
Add(ctx Context, t2 Tensor) Tensor
Mul(ctx Context, t2 Tensor) Tensor
MatMul(ctx Context, t2 Tensor) Tensor
ScaledDotProductAttention(ctx Context, key, value, mask Tensor, scale float64) Tensor
RMSNorm(ctx Context, w Tensor, eps float32) Tensor
// ... many more operations
}
func RegisterBackend(name string, f func(string, BackendParams) (Backend, error))
func NewBackend(modelPath string, params BackendParams) (Backend, error)
Import
import "github.com/ollama/ollama/ml"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| modelPath | string | Yes | Path to the GGUF model file |
| params | BackendParams | Yes | Backend configuration (GPU layers, threads, flash attention) |
| name | string | Yes | Backend name for registration (e.g., "ggml") |
Outputs
| Name | Type | Description |
|---|---|---|
| Backend | interface | Loaded model backend with tensor access and context creation |
| Context | interface | Computation context for building and executing tensor graphs |
| Tensor | interface | Multi-dimensional array with arithmetic and ML operations |
Usage Examples
// Load a model using the backend abstraction
backend, err := ml.NewBackend("/path/to/model.gguf", ml.BackendParams{
NumGPULayers: 35,
NumThreads: 8,
FlashAttention: true,
})
// Create a computation context
ctx := backend.NewContext()
defer ctx.Close()
// Get a tensor and perform operations
weight := backend.Get("blk.0.attn_q.weight")
result := input.MatMul(ctx, weight)