Implementation:Ollama Ollama MLXRunner Runner

Knowledge Sources	Ollama
Domains	MLX Runtime, Inference Server
Last Updated	2025-02-15 00:00 GMT

Overview

Defines the core Runner struct and request/response types for the MLX inference engine, including model loading, weight management, and the HTTP server event loop.

Description

The Runner holds the loaded model, tokenizer, a request channel, and KV cache entries. Load opens a model by name, creates the model via the base registry, then loads all tensor blobs from the manifest using a three-phase approach: (1) load raw tensors, (2) identify scale tensors, (3) remap bias tensors for quantized models. Run starts two goroutines: one consumes requests from the channel and runs pipelines, the other listens on an HTTP port. Request and Response types define the JSON wire format for the completion API.

Usage

The main entry point for the MLX runner subprocess, orchestrating model loading and inference request processing.

Code Reference

Source Location

Repository: Ollama
File: x/mlxrunner/runner.go
Lines: 1-174

Signature

type Runner struct {
    Model        base.Model
    Tokenizer    *tokenizer.Tokenizer
    Requests     chan Request
    CacheEntries map[int32]*CacheEntry
}

type Request struct {
    TextCompletionsRequest
    Responses chan Response
    Pipeline  func(Request) error
    sample.Sampler
}

type Response struct {
    Text       string        `json:"content,omitempty"`
    Token      int           `json:"token,omitempty"`
    Done       bool          `json:"done,omitempty"`
    DoneReason int           `json:"done_reason,omitempty"`
    PromptTokens int         `json:"prompt_eval_count,omitempty"`
    CompletionTokens int     `json:"eval_count,omitempty"`
}

func (r *Runner) Load(modelName string) error
func (r *Runner) Run(host, port string, mux http.Handler) error

Import

import "github.com/ollama/ollama/x/mlxrunner"

I/O Contract

Inputs

Name	Type	Required	Description
modelName	string	Yes	Model name to load from manifest
host	string	Yes	Host to listen on (e.g. "127.0.0.1")
port	string	Yes	Port number for HTTP server

Outputs

Name	Type	Description
error	error	Non-nil if model loading or server startup fails

Usage Examples

runner := Runner{
    Requests:     make(chan Request),
    CacheEntries: make(map[int32]*CacheEntry),
}

if err := runner.Load("my-model:latest"); err != nil {
    log.Fatal(err)
}

mux := http.NewServeMux()
// ... register handlers ...
runner.Run("127.0.0.1", "8080", mux)

Related Pages

Principle:Ollama_Ollama_MLXRunner_Architecture

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment