Implementation:Ollama Ollama MLXRunner Server
| Knowledge Sources | |
|---|---|
| Domains | MLX Runtime, Inference Server |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
HTTP server entry point for the MLX runner subprocess, providing REST endpoints for health checks, model management, text completion, and tokenization.
Description
Execute parses command-line flags (model name, port), initializes MLX, loads the model, and registers HTTP handlers. Endpoints include: GET /v1/status for health checks, /v1/models for model management, POST /v1/completions for streaming text generation (JSONL format), and POST /v1/tokenize for encoding text to token IDs. Includes legacy redirect routes for backward compatibility. Uses a statusRecorder middleware for structured logging of request metrics. Responses are flushed per-line for real-time streaming.
Usage
Launched as a subprocess by mlxrunner.NewClient when serving MLX-based models. Runs independently on a local port.
Code Reference
Source Location
- Repository: Ollama
- File: x/mlxrunner/server.go
- Lines: 1-182
Signature
func Execute(args []string) error
type statusRecorder struct {
http.ResponseWriter
code int
}
func (w *statusRecorder) WriteHeader(code int)
func (w *statusRecorder) Status() string
func (w *statusRecorder) Flush()
Import
import "github.com/ollama/ollama/x/mlxrunner"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | []string | Yes | Command-line args: -model, -port, -verbose |
Outputs
| Name | Type | Description |
|---|---|---|
| error | error | Non-nil if MLX init, model loading, or server fails |
Usage Examples
// Typically invoked as subprocess:
// ollama runner --mlx-engine -model my-model -port 12345
err := mlxrunner.Execute([]string{"-model", "my-model", "-port", "12345"})
if err != nil {
log.Fatal(err)
}