Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Turboderp org Exllamav2 Prompt Format Templating

From Leeroopedia
Knowledge Sources
Domains NLP, Prompt_Engineering, Chat
Last Updated 2026-02-15 00:00 GMT

Overview

Different language models are trained with specific prompt formats that structure conversations, and using the correct format is critical for eliciting proper model behavior.

Description

Instruction-tuned and chat-tuned language models are fine-tuned with specific prompt templates that delineate system instructions, user messages, and assistant responses. If the prompt format does not match what the model was trained on, the model may produce incoherent, repetitive, or off-topic output.

Common prompt formats include:

  • ChatML: Used by models like Qwen, Yi, and some Mistral variants:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
  • Llama 2: Used by Meta's Llama 2 chat models:
[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Hello! [/INST]
  • Llama 3: Used by Meta's Llama 3 instruction models:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
  • Gemma: Used by Google's Gemma models:
<start_of_turn>user
Hello!<end_of_turn>
<start_of_turn>model
  • DeepSeek: Used by DeepSeek models with thinking/reasoning tokens.

A prompt format abstraction provides a model-agnostic interface for conversation management. The abstraction defines methods for:

  • System prompt: A default system instruction for the model.
  • First prompt: The template for the first user message (which may include the system prompt inline for some formats).
  • Subsequent prompts: The template for follow-up user messages in a multi-turn conversation.
  • Stop conditions: Which tokens or strings signal the end of the assistant's response.
  • Encoding options: Whether to encode special tokens, add BOS, etc.

Usage

Use prompt format templating when:

  • Building chat or instruction-following applications
  • Supporting multiple model families with a single codebase
  • Implementing multi-turn conversation management
  • Ensuring correct stop conditions for each model type

Theoretical Basis

Prompt Format Interface

class PromptFormat:
    # Model description and identification
    description: str

    # Template methods:
    def default_system_prompt() -> str:
        # Return the model's default system prompt
        return "You are a helpful assistant."

    def first_prompt(system_prompt, user_message) -> str:
        # Format the first turn including system prompt
        return format_template(system_prompt, user_message)

    def subs_prompt(user_message) -> str:
        # Format subsequent turns (no system prompt)
        return format_template(user_message)

    def stop_conditions(tokenizer) -> list:
        # Return stop tokens/strings for this format
        return [tokenizer.eos_token_id, "<|im_end|>"]

    def encoding_options() -> dict:
        # Return encoding flags
        return {"add_bos": True, "encode_special_tokens": True}

Format Selection

# The correct format is typically determined by the model's metadata
# or explicitly specified by the user.
#
# Common mapping:
#   LlamaForCausalLM (Llama 2) -> PromptFormat_llama
#   LlamaForCausalLM (Llama 3) -> PromptFormat_llama3
#   MistralForCausalLM         -> PromptFormat_mistral
#   Qwen2ForCausalLM           -> PromptFormat_chatml
#   PhiForCausalLM             -> PromptFormat_phi3
#   GemmaForCausalLM           -> PromptFormat_gemma
#   DeepseekForCausalLM        -> PromptFormat_deepseek

# Each format ensures the model sees the exact token structure
# it was trained on, which is critical for instruction following.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment