Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Scheduler GetRunner

From Leeroopedia
Knowledge Sources
Domains Systems, GPU_Computing, Model_Serving
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for obtaining a loaded model inference runner provided by the Ollama scheduler.

Description

Scheduler.GetRunner is the primary interface for requesting a model runner. It checks if the requested model is already loaded and compatible with the request parameters. If so, it returns the existing runner immediately through a channel. Otherwise, it enqueues the request for the scheduler's background goroutines to process.

The scheduler background processes (processPending and processCompleted) handle model loading, GPU memory management, and eviction. NewLlamaServer creates the actual inference process, either as a llama.cpp subprocess or Go-native runner. GPUDevices discovers available GPU hardware and VRAM.

Usage

Called internally by GenerateHandler and ChatHandler to obtain a model runner before processing an inference request. Not directly accessible to API clients.

Code Reference

Source Location

  • Repository: ollama
  • File: server/sched.go (Scheduler.GetRunner), llm/server.go (NewLlamaServer), discover/runner.go (GPUDevices)
  • Lines: sched.go:L87-120 (GetRunner), server.go:L144-420 (NewLlamaServer), runner.go:L34-504 (GPUDevices)

Signature

func (s *Scheduler) GetRunner(
    c context.Context,
    m *Model,
    opts api.Options,
    sessionDuration *api.Duration,
    useImagegen bool,
) (chan *runnerRef, chan error)
func NewLlamaServer(
    systemInfo ml.SystemInfo,
    gpus []ml.DeviceInfo,
    modelPath string,
    f *ggml.GGML,
    adapters, projectors []string,
    opts api.Options,
    numParallel int,
) (LlamaServer, error)

Import

import "github.com/ollama/ollama/server"
import "github.com/ollama/ollama/llm"

I/O Contract

Inputs

Name Type Required Description
c context.Context Yes Request context for cancellation
m *Model Yes Parsed model with weights path, adapter paths, projector paths
opts api.Options Yes Runtime options: num_ctx, num_gpu, temperature, etc.
sessionDuration *api.Duration No Keep-alive duration for the runner (default: 5 minutes)
useImagegen bool No Whether to use image generation mode

Outputs

Name Type Description
successCh chan *runnerRef Channel that receives the loaded runner reference
errCh chan error Channel that receives error if loading fails

Usage Examples

Internal Usage in ChatHandler

// From server/routes.go ChatHandler
rCh, eCh := s.sched.GetRunner(c.Request.Context(), model, opts, req.KeepAlive, false)
var runner *runnerRef
select {
case runner = <-rCh:
case err := <-eCh:
    handleError(c, err)
    return
}
// Use runner for inference...

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment