Implementation:Ollama Ollama MLXRunner Pipeline

Knowledge Sources	Ollama
Domains	MLX Runtime, Text Generation
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the text generation pipeline for the MLX runner, coordinating prompt processing, autoregressive token generation, sampling, and KV cache management.

Description

TextGenerationPipeline first looks up the nearest cached KV state for the input tokens. Processes the prompt in chunks of up to 2048 tokens, evaluating KV cache state after each chunk. Then runs autoregressive generation: at each step it computes logits via Forward/Unembed, applies log-softmax for logprobs, samples the next token, and streams the decoded text through the response channel. Uses AsyncEval for pipelining: the next step's evaluation starts while the current token is being processed. Decode handles UTF-8 reassembly of partial characters across token boundaries.

Usage

The core text generation loop invoked for each completion request in the MLX inference server.

Code Reference

Source Location

Repository: Ollama
File: x/mlxrunner/pipeline.go
Lines: 1-126

Signature

func (r *Runner) TextGenerationPipeline(request Request) error
func (r Runner) Decode(sample int32, b *bytes.Buffer) string

Import

import "github.com/ollama/ollama/x/mlxrunner"

I/O Contract

Inputs

Name	Type	Required	Description
request	Request	Yes	Contains prompt, sampling params, and response channel

Outputs

Name	Type	Description
error	error	Non-nil if the model is not loaded
(channel)	chan Response	Streaming responses sent to request.Responses

Usage Examples

request := Request{
    TextCompletionsRequest: TextCompletionsRequest{
        Prompt: "Hello, world!",
        Options: struct{ ... }{MaxTokens: 100, Temperature: 0.7},
    },
    Responses: make(chan Response),
    Pipeline:  runner.TextGenerationPipeline,
    Sampler:   sample.New(0.7, 0, 0, 0),
}

go func() {
    for resp := range request.Responses {
        if resp.Done {
            break
        }
        fmt.Print(resp.Text)
    }
}()

runner.Requests <- request

Related Pages

Principle:Ollama_Ollama_MLXRunner_Architecture

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment