Implementation:Ollama Ollama Scheduler GetRunner

Knowledge Sources	Ollama
Domains	Systems, GPU_Computing, Model_Serving
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for obtaining a loaded model inference runner provided by the Ollama scheduler.

Description

Scheduler.GetRunner is the primary interface for requesting a model runner. It checks if the requested model is already loaded and compatible with the request parameters. If so, it returns the existing runner immediately through a channel. Otherwise, it enqueues the request for the scheduler's background goroutines to process.

The scheduler background processes (processPending and processCompleted) handle model loading, GPU memory management, and eviction. NewLlamaServer creates the actual inference process, either as a llama.cpp subprocess or Go-native runner. GPUDevices discovers available GPU hardware and VRAM.

Usage

Called internally by GenerateHandler and ChatHandler to obtain a model runner before processing an inference request. Not directly accessible to API clients.

Code Reference

Source Location

Repository: ollama
File: server/sched.go (Scheduler.GetRunner), llm/server.go (NewLlamaServer), discover/runner.go (GPUDevices)
Lines: sched.go:L87-120 (GetRunner), server.go:L144-420 (NewLlamaServer), runner.go:L34-504 (GPUDevices)

Signature

func (s *Scheduler) GetRunner(
    c context.Context,
    m *Model,
    opts api.Options,
    sessionDuration *api.Duration,
    useImagegen bool,
) (chan *runnerRef, chan error)

func NewLlamaServer(
    systemInfo ml.SystemInfo,
    gpus []ml.DeviceInfo,
    modelPath string,
    f *ggml.GGML,
    adapters, projectors []string,
    opts api.Options,
    numParallel int,
) (LlamaServer, error)

Import

import "github.com/ollama/ollama/server"
import "github.com/ollama/ollama/llm"

I/O Contract

Inputs

Name	Type	Required	Description
c	context.Context	Yes	Request context for cancellation
m	*Model	Yes	Parsed model with weights path, adapter paths, projector paths
opts	api.Options	Yes	Runtime options: num_ctx, num_gpu, temperature, etc.
sessionDuration	*api.Duration	No	Keep-alive duration for the runner (default: 5 minutes)
useImagegen	bool	No	Whether to use image generation mode

Outputs

Name	Type	Description
successCh	chan *runnerRef	Channel that receives the loaded runner reference
errCh	chan error	Channel that receives error if loading fails

Usage Examples

Internal Usage in ChatHandler

// From server/routes.go ChatHandler
rCh, eCh := s.sched.GetRunner(c.Request.Context(), model, opts, req.KeepAlive, false)
var runner *runnerRef
select {
case runner = <-rCh:
case err := <-eCh:
    handleError(c, err)
    return
}
// Use runner for inference...

Related Pages

Implements Principle

Principle:Ollama_Ollama_Model_Loading_And_GPU_Scheduling

Requires Environment

Environment:Ollama_Ollama_GPU_Runtime

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment