Implementation:Ollama Ollama MLXRunner Server

Knowledge Sources	Ollama
Domains	MLX Runtime, Inference Server
Last Updated	2025-02-15 00:00 GMT

Overview

HTTP server entry point for the MLX runner subprocess, providing REST endpoints for health checks, model management, text completion, and tokenization.

Description

Execute parses command-line flags (model name, port), initializes MLX, loads the model, and registers HTTP handlers. Endpoints include: GET /v1/status for health checks, /v1/models for model management, POST /v1/completions for streaming text generation (JSONL format), and POST /v1/tokenize for encoding text to token IDs. Includes legacy redirect routes for backward compatibility. Uses a statusRecorder middleware for structured logging of request metrics. Responses are flushed per-line for real-time streaming.

Usage

Launched as a subprocess by mlxrunner.NewClient when serving MLX-based models. Runs independently on a local port.

Code Reference

Source Location

Repository: Ollama
File: x/mlxrunner/server.go
Lines: 1-182

Signature

func Execute(args []string) error

type statusRecorder struct {
    http.ResponseWriter
    code int
}
func (w *statusRecorder) WriteHeader(code int)
func (w *statusRecorder) Status() string
func (w *statusRecorder) Flush()

Import

import "github.com/ollama/ollama/x/mlxrunner"

I/O Contract

Inputs

Name	Type	Required	Description
args	[]string	Yes	Command-line args: -model, -port, -verbose

Outputs

Name	Type	Description
error	error	Non-nil if MLX init, model loading, or server fails

Usage Examples

// Typically invoked as subprocess:
// ollama runner --mlx-engine -model my-model -port 12345

err := mlxrunner.Execute([]string{"-model", "my-model", "-port", "12345"})
if err != nil {
    log.Fatal(err)
}

Related Pages

Principle:Ollama_Ollama_MLXRunner_Architecture

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment