Implementation:Ollama Ollama MLXRunner Pipeline
| Knowledge Sources | |
|---|---|
| Domains | MLX Runtime, Text Generation |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the text generation pipeline for the MLX runner, coordinating prompt processing, autoregressive token generation, sampling, and KV cache management.
Description
TextGenerationPipeline first looks up the nearest cached KV state for the input tokens. Processes the prompt in chunks of up to 2048 tokens, evaluating KV cache state after each chunk. Then runs autoregressive generation: at each step it computes logits via Forward/Unembed, applies log-softmax for logprobs, samples the next token, and streams the decoded text through the response channel. Uses AsyncEval for pipelining: the next step's evaluation starts while the current token is being processed. Decode handles UTF-8 reassembly of partial characters across token boundaries.
Usage
The core text generation loop invoked for each completion request in the MLX inference server.
Code Reference
Source Location
- Repository: Ollama
- File: x/mlxrunner/pipeline.go
- Lines: 1-126
Signature
func (r *Runner) TextGenerationPipeline(request Request) error
func (r Runner) Decode(sample int32, b *bytes.Buffer) string
Import
import "github.com/ollama/ollama/x/mlxrunner"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| request | Request | Yes | Contains prompt, sampling params, and response channel |
Outputs
| Name | Type | Description |
|---|---|---|
| error | error | Non-nil if the model is not loaded |
| (channel) | chan Response | Streaming responses sent to request.Responses |
Usage Examples
request := Request{
TextCompletionsRequest: TextCompletionsRequest{
Prompt: "Hello, world!",
Options: struct{ ... }{MaxTokens: 100, Temperature: 0.7},
},
Responses: make(chan Response),
Pipeline: runner.TextGenerationPipeline,
Sampler: sample.New(0.7, 0, 0, 0),
}
go func() {
for resp := range request.Responses {
if resp.Done {
break
}
fmt.Print(resp.Text)
}
}()
runner.Requests <- request