Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama MLXRunner Runner

From Leeroopedia
Knowledge Sources
Domains MLX Runtime, Inference Server
Last Updated 2025-02-15 00:00 GMT

Overview

Defines the core Runner struct and request/response types for the MLX inference engine, including model loading, weight management, and the HTTP server event loop.

Description

The Runner holds the loaded model, tokenizer, a request channel, and KV cache entries. Load opens a model by name, creates the model via the base registry, then loads all tensor blobs from the manifest using a three-phase approach: (1) load raw tensors, (2) identify scale tensors, (3) remap bias tensors for quantized models. Run starts two goroutines: one consumes requests from the channel and runs pipelines, the other listens on an HTTP port. Request and Response types define the JSON wire format for the completion API.

Usage

The main entry point for the MLX runner subprocess, orchestrating model loading and inference request processing.

Code Reference

Source Location

  • Repository: Ollama
  • File: x/mlxrunner/runner.go
  • Lines: 1-174

Signature

type Runner struct {
    Model        base.Model
    Tokenizer    *tokenizer.Tokenizer
    Requests     chan Request
    CacheEntries map[int32]*CacheEntry
}

type Request struct {
    TextCompletionsRequest
    Responses chan Response
    Pipeline  func(Request) error
    sample.Sampler
}

type Response struct {
    Text       string        `json:"content,omitempty"`
    Token      int           `json:"token,omitempty"`
    Done       bool          `json:"done,omitempty"`
    DoneReason int           `json:"done_reason,omitempty"`
    PromptTokens int         `json:"prompt_eval_count,omitempty"`
    CompletionTokens int     `json:"eval_count,omitempty"`
}

func (r *Runner) Load(modelName string) error
func (r *Runner) Run(host, port string, mux http.Handler) error

Import

import "github.com/ollama/ollama/x/mlxrunner"

I/O Contract

Inputs

Name Type Required Description
modelName string Yes Model name to load from manifest
host string Yes Host to listen on (e.g. "127.0.0.1")
port string Yes Port number for HTTP server

Outputs

Name Type Description
error error Non-nil if model loading or server startup fails

Usage Examples

runner := Runner{
    Requests:     make(chan Request),
    CacheEntries: make(map[int32]*CacheEntry),
}

if err := runner.Load("my-model:latest"); err != nil {
    log.Fatal(err)
}

mux := http.NewServeMux()
// ... register handlers ...
runner.Run("127.0.0.1", "8080", mux)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment