Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama ML Backend

From Leeroopedia
Revision as of 13:27, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ollama_Ollama_ML_Backend.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains ML Abstraction, Tensor Operations
Last Updated 2025-02-15 00:00 GMT

Overview

Defines the core machine learning backend abstraction layer for Ollama, providing interfaces for tensor computation, model loading, and inference execution that decouple model logic from specific hardware implementations.

Description

The ml/backend.go file declares three foundational interfaces: Backend (model loading, configuration, tensor access, context creation, device enumeration), Context (tensor creation, graph computation, memory reservation, layer-specific contexts), and Tensor (arithmetic, matrix multiplication, attention, normalization, reshaping, and many other tensor operations). It uses a plugin registry pattern where backends register via RegisterBackend and are instantiated via NewBackend. BackendParams configures GPU layer offloading, thread count, and flash attention. CacheConfig controls cache padding, V tensor permutation, and mask dtype optimizations. The file also includes Dump utilities for debugging tensor values.

Usage

All model implementations program against these interfaces, enabling the same model code to run on different hardware backends (CPU, CUDA, Metal, Vulkan, etc.) via the GGML backend implementation. The NewBackend function is called when loading a model.

Code Reference

Source Location

  • Repository: Ollama
  • File: ml/backend.go
  • Lines: 1-412

Signature

type Backend interface {
    Config() config
    Get(name string) ml.Tensor
    NewContext() ml.Context
    Devices() []Device
    // ...
}

type Context interface {
    Zeros(dtype DType, shape ...int) Tensor
    FromFloatSlice(s []float32, shape ...int) (Tensor, error)
    Forward(tensors ...Tensor) Context
    Compute(tensors ...Tensor)
    MaxGraphNodes() int
    Close()
}

type Tensor interface {
    Dim(n int) int
    Add(ctx Context, t2 Tensor) Tensor
    Mul(ctx Context, t2 Tensor) Tensor
    MatMul(ctx Context, t2 Tensor) Tensor
    ScaledDotProductAttention(ctx Context, key, value, mask Tensor, scale float64) Tensor
    RMSNorm(ctx Context, w Tensor, eps float32) Tensor
    // ... many more operations
}

func RegisterBackend(name string, f func(string, BackendParams) (Backend, error))
func NewBackend(modelPath string, params BackendParams) (Backend, error)

Import

import "github.com/ollama/ollama/ml"

I/O Contract

Inputs

Name Type Required Description
modelPath string Yes Path to the GGUF model file
params BackendParams Yes Backend configuration (GPU layers, threads, flash attention)
name string Yes Backend name for registration (e.g., "ggml")

Outputs

Name Type Description
Backend interface Loaded model backend with tensor access and context creation
Context interface Computation context for building and executing tensor graphs
Tensor interface Multi-dimensional array with arithmetic and ML operations

Usage Examples

// Load a model using the backend abstraction
backend, err := ml.NewBackend("/path/to/model.gguf", ml.BackendParams{
    NumGPULayers: 35,
    NumThreads:   8,
    FlashAttention: true,
})

// Create a computation context
ctx := backend.NewContext()
defer ctx.Close()

// Get a tensor and perform operations
weight := backend.Get("blk.0.attn_q.weight")
result := input.MatMul(ctx, weight)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment