Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Ollama Ollama Prompt Construction

From Leeroopedia
Knowledge Sources
Domains NLP, Prompt_Engineering
Last Updated 2026-02-14 00:00 GMT

Overview

A template-driven prompt construction mechanism that renders chat messages, tool definitions, and system prompts into model-specific prompt formats with context window truncation.

Description

Prompt Construction is the process of converting structured chat messages (system, user, assistant, tool) into the exact text format a specific language model expects. Different model families use different prompt formats: ChatML, Llama-style, Mistral-style, and others. This principle ensures that regardless of the chat API format used by the client, the prompt rendered for the model matches its training format.

The mechanism also handles context window management by tokenizing the rendered prompt and truncating older messages (while preserving system messages and the most recent user message) to fit within the model's maximum context length. For multimodal models, image tokens are accounted for in the context budget.

Usage

Use this principle when designing a multi-model inference server that must support diverse prompt formats. It applies whenever chat messages need to be formatted according to model-specific templates before being sent to the inference engine.

Theoretical Basis

The prompt construction algorithm:

  1. Template Selection: Each model carries a Go text/template string (from GGUF metadata or Modelfile) that defines its prompt format.
  2. Message Rendering: The template is executed with the full message history, tool definitions, and thinking mode settings.
  3. Tokenization: The rendered prompt is tokenized to count tokens.
  4. Truncation Loop: If the token count exceeds the model's context window (num_ctx), remove the oldest non-system messages one at a time and re-render until the prompt fits.
  5. Image Handling: For multimodal models, each image is budgeted as a fixed number of tokens (768) and image data is extracted for separate embedding.

Pseudo-code:

// Abstract prompt construction
func constructPrompt(messages, template, maxCtx) (string, []Image) {
    for i := 0; i < len(messages); i++ {
        prompt = template.Render(systemMsgs + messages[i:])
        tokens = tokenize(prompt)
        if len(tokens) + imageTokens <= maxCtx {
            return prompt, extractImages(messages[i:])
        }
    }
    return prompt, images  // use last message at minimum
}

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment