Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama MLXRunner Pipeline

From Leeroopedia
Knowledge Sources
Domains MLX Runtime, Text Generation
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the text generation pipeline for the MLX runner, coordinating prompt processing, autoregressive token generation, sampling, and KV cache management.

Description

TextGenerationPipeline first looks up the nearest cached KV state for the input tokens. Processes the prompt in chunks of up to 2048 tokens, evaluating KV cache state after each chunk. Then runs autoregressive generation: at each step it computes logits via Forward/Unembed, applies log-softmax for logprobs, samples the next token, and streams the decoded text through the response channel. Uses AsyncEval for pipelining: the next step's evaluation starts while the current token is being processed. Decode handles UTF-8 reassembly of partial characters across token boundaries.

Usage

The core text generation loop invoked for each completion request in the MLX inference server.

Code Reference

Source Location

  • Repository: Ollama
  • File: x/mlxrunner/pipeline.go
  • Lines: 1-126

Signature

func (r *Runner) TextGenerationPipeline(request Request) error
func (r Runner) Decode(sample int32, b *bytes.Buffer) string

Import

import "github.com/ollama/ollama/x/mlxrunner"

I/O Contract

Inputs

Name Type Required Description
request Request Yes Contains prompt, sampling params, and response channel

Outputs

Name Type Description
error error Non-nil if the model is not loaded
(channel) chan Response Streaming responses sent to request.Responses

Usage Examples

request := Request{
    TextCompletionsRequest: TextCompletionsRequest{
        Prompt: "Hello, world!",
        Options: struct{ ... }{MaxTokens: 100, Temperature: 0.7},
    },
    Responses: make(chan Response),
    Pipeline:  runner.TextGenerationPipeline,
    Sampler:   sample.New(0.7, 0, 0, 0),
}

go func() {
    for resp := range request.Responses {
        if resp.Done {
            break
        }
        fmt.Print(resp.Text)
    }
}()

runner.Requests <- request

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment