Implementation:Ollama Ollama Scheduler GetRunner
| Knowledge Sources | |
|---|---|
| Domains | Systems, GPU_Computing, Model_Serving |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tool for obtaining a loaded model inference runner provided by the Ollama scheduler.
Description
Scheduler.GetRunner is the primary interface for requesting a model runner. It checks if the requested model is already loaded and compatible with the request parameters. If so, it returns the existing runner immediately through a channel. Otherwise, it enqueues the request for the scheduler's background goroutines to process.
The scheduler background processes (processPending and processCompleted) handle model loading, GPU memory management, and eviction. NewLlamaServer creates the actual inference process, either as a llama.cpp subprocess or Go-native runner. GPUDevices discovers available GPU hardware and VRAM.
Usage
Called internally by GenerateHandler and ChatHandler to obtain a model runner before processing an inference request. Not directly accessible to API clients.
Code Reference
Source Location
- Repository: ollama
- File: server/sched.go (Scheduler.GetRunner), llm/server.go (NewLlamaServer), discover/runner.go (GPUDevices)
- Lines: sched.go:L87-120 (GetRunner), server.go:L144-420 (NewLlamaServer), runner.go:L34-504 (GPUDevices)
Signature
func (s *Scheduler) GetRunner(
c context.Context,
m *Model,
opts api.Options,
sessionDuration *api.Duration,
useImagegen bool,
) (chan *runnerRef, chan error)
func NewLlamaServer(
systemInfo ml.SystemInfo,
gpus []ml.DeviceInfo,
modelPath string,
f *ggml.GGML,
adapters, projectors []string,
opts api.Options,
numParallel int,
) (LlamaServer, error)
Import
import "github.com/ollama/ollama/server"
import "github.com/ollama/ollama/llm"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| c | context.Context | Yes | Request context for cancellation |
| m | *Model | Yes | Parsed model with weights path, adapter paths, projector paths |
| opts | api.Options | Yes | Runtime options: num_ctx, num_gpu, temperature, etc. |
| sessionDuration | *api.Duration | No | Keep-alive duration for the runner (default: 5 minutes) |
| useImagegen | bool | No | Whether to use image generation mode |
Outputs
| Name | Type | Description |
|---|---|---|
| successCh | chan *runnerRef | Channel that receives the loaded runner reference |
| errCh | chan error | Channel that receives error if loading fails |
Usage Examples
Internal Usage in ChatHandler
// From server/routes.go ChatHandler
rCh, eCh := s.sched.GetRunner(c.Request.Context(), model, opts, req.KeepAlive, false)
var runner *runnerRef
select {
case runner = <-rCh:
case err := <-eCh:
handleError(c, err)
return
}
// Use runner for inference...