Implementation:Ollama Ollama MLXRunner Client

Knowledge Sources	Ollama
Domains	MLX Runtime, Inference Server
Last Updated	2025-02-15 00:00 GMT

Overview

Wraps an MLX runner subprocess to implement the llm.LlamaServer interface, enabling Ollama to serve MLX-based models through its standard server architecture.

Description

The Client struct manages the lifecycle of an MLX runner subprocess. NewClient spawns the subprocess (ollama runner --mlx-engine) on a dynamically allocated port, configures library paths for MLX discovery, estimates VRAM from the model manifest, and polls the health endpoint until the runner is ready. It implements the full llm.LlamaServer interface by proxying HTTP requests to the subprocess for completion, tokenization, and health checks.

Usage

Used by the Ollama server to serve safetensors LLM models via the MLX inference engine, providing the same API surface as GGUF models served through llama.cpp.

Code Reference

Source Location

Repository: Ollama
File: x/mlxrunner/client.go
Lines: 1-414

Signature

type Client struct {
    port        int
    modelName   string
    vramSize    uint64
    done        chan error
    client      *http.Client
    lastErr     string
    lastErrLock sync.Mutex
    mu          sync.Mutex
    cmd         *exec.Cmd
}

func NewClient(modelName string) (*Client, error)
func (c *Client) Close() error
func (c *Client) Completion(ctx context.Context, req llm.CompletionRequest, fn func(llm.CompletionResponse)) error
func (c *Client) Ping(ctx context.Context) error
func (c *Client) Tokenize(ctx context.Context, content string) ([]int, error)

var _ llm.LlamaServer = (*Client)(nil)

Import

import "github.com/ollama/ollama/x/mlxrunner"

I/O Contract

Inputs

Name	Type	Required	Description
modelName	string	Yes	Name of the model to load

Outputs

Name	Type	Description
*Client	*Client	Initialized client connected to running MLX subprocess
error	error	Non-nil if subprocess fails to start or become ready

Usage Examples

client, err := mlxrunner.NewClient("my-model:latest")
if err != nil {
    log.Fatal(err)
}
defer client.Close()

err = client.Completion(ctx, llm.CompletionRequest{
    Prompt: "Hello, world!",
}, func(resp llm.CompletionResponse) {
    fmt.Print(resp.Content)
})

Related Pages

Principle:Ollama_Ollama_MLXRunner_Architecture

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment