Implementation:Ollama Ollama MLXRunner Runner
| Knowledge Sources | |
|---|---|
| Domains | MLX Runtime, Inference Server |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Defines the core Runner struct and request/response types for the MLX inference engine, including model loading, weight management, and the HTTP server event loop.
Description
The Runner holds the loaded model, tokenizer, a request channel, and KV cache entries. Load opens a model by name, creates the model via the base registry, then loads all tensor blobs from the manifest using a three-phase approach: (1) load raw tensors, (2) identify scale tensors, (3) remap bias tensors for quantized models. Run starts two goroutines: one consumes requests from the channel and runs pipelines, the other listens on an HTTP port. Request and Response types define the JSON wire format for the completion API.
Usage
The main entry point for the MLX runner subprocess, orchestrating model loading and inference request processing.
Code Reference
Source Location
- Repository: Ollama
- File: x/mlxrunner/runner.go
- Lines: 1-174
Signature
type Runner struct {
Model base.Model
Tokenizer *tokenizer.Tokenizer
Requests chan Request
CacheEntries map[int32]*CacheEntry
}
type Request struct {
TextCompletionsRequest
Responses chan Response
Pipeline func(Request) error
sample.Sampler
}
type Response struct {
Text string `json:"content,omitempty"`
Token int `json:"token,omitempty"`
Done bool `json:"done,omitempty"`
DoneReason int `json:"done_reason,omitempty"`
PromptTokens int `json:"prompt_eval_count,omitempty"`
CompletionTokens int `json:"eval_count,omitempty"`
}
func (r *Runner) Load(modelName string) error
func (r *Runner) Run(host, port string, mux http.Handler) error
Import
import "github.com/ollama/ollama/x/mlxrunner"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| modelName | string | Yes | Model name to load from manifest |
| host | string | Yes | Host to listen on (e.g. "127.0.0.1") |
| port | string | Yes | Port number for HTTP server |
Outputs
| Name | Type | Description |
|---|---|---|
| error | error | Non-nil if model loading or server startup fails |
Usage Examples
runner := Runner{
Requests: make(chan Request),
CacheEntries: make(map[int32]*CacheEntry),
}
if err := runner.Load("my-model:latest"); err != nil {
log.Fatal(err)
}
mux := http.NewServeMux()
// ... register handlers ...
runner.Run("127.0.0.1", "8080", mux)