Implementation:Turboderp org Exllamav2 PromptFormat Interface

Knowledge Sources	ExLlamaV2
Domains	NLP, Prompt_Engineering, Chat
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete pattern for implementing model-specific prompt formatting through a base class and 18+ subclasses, provided as example code by exllamav2.

Description

PromptFormat is a base class defined in the exllamav2 examples that provides a standard interface for model-specific prompt templating. Each model family has a subclass that implements the correct token structure for that model's chat/instruction format.

The base class defines five methods that subclasses must implement:

default_system_prompt(): Returns the model's default system instruction text.
first_prompt(system_prompt, user_message): Formats the first conversation turn, typically embedding the system prompt.
subs_prompt(user_message): Formats subsequent conversation turns.
stop_conditions(tokenizer): Returns a list of token IDs and/or strings that signal the end of the assistant's response.
encoding_options(): Returns a dictionary of encoding flags (add_bos, encode_special_tokens, etc.).

Available subclasses include:

Subclass	Models	Format Style
PromptFormat_raw	Any	No formatting, raw text
PromptFormat_llama	Llama 2	[INST] / <<SYS>> markers
PromptFormat_llama3	Llama 3	Header ID tokens
PromptFormat_codellama	Code Llama	Code-specific [INST] variant
PromptFormat_chatml	Qwen, Yi, etc.	im_start\|>/<\|im_end\|>
PromptFormat_mistral	Mistral	[INST] without <<SYS>>
PromptFormat_phi3	Phi-3	system\|>/<\|user\|>/<\|assistant\|>
PromptFormat_gemma	Gemma	<start_of_turn>/<end_of_turn>
PromptFormat_deepseek	DeepSeek	DeepSeek-specific tokens
PromptFormat_cohere	Command R	Cohere chat format

A prompt_formats dictionary maps format names to their classes for easy lookup.

Note: This is example code, not an installable API. Copy the pattern into your own project or import directly from the examples directory.

Usage

Use PromptFormat subclasses when building chat applications:

Select the appropriate subclass for your model
Use first_prompt() for the initial message
Use subs_prompt() for follow-up messages
Use stop_conditions() to configure generation termination
Use encoding_options() to set tokenizer flags

Code Reference

Source Location

Repository: exllamav2
File: examples/chat_prompts.py
Lines: L2-31 (base class), L721-741 (prompt_formats dict)

Base Class Signature

class PromptFormat:

    botname: str = "Assistant"
    username: str = "User"
    description: str = "Undefined prompt format"

    def default_system_prompt(self) -> str:
        ...

    def first_prompt(self, system_prompt: str, user_message: str) -> str:
        ...

    def subs_prompt(self, user_message: str) -> str:
        ...

    def stop_conditions(self, tokenizer) -> list:
        ...

    def encoding_options(self) -> dict:
        ...

ChatML Subclass Example

class PromptFormat_chatml(PromptFormat):

    description = "ChatML"

    def default_system_prompt(self):
        return "You are a helpful assistant."

    def first_prompt(self, system_prompt, user_message):
        return (
            f"<|im_start|>system\n{system_prompt}<|im_end|>\n"
            f"<|im_start|>user\n{user_message}<|im_end|>\n"
            f"<|im_start|>assistant\n"
        )

    def subs_prompt(self, user_message):
        return (
            f"<|im_end|>\n"
            f"<|im_start|>user\n{user_message}<|im_end|>\n"
            f"<|im_start|>assistant\n"
        )

    def stop_conditions(self, tokenizer):
        return [tokenizer.eos_token_id, "<|im_end|>"]

    def encoding_options(self):
        return {"add_bos": False, "encode_special_tokens": True}

Import

# From the examples directory (not a pip-installable module):
from examples.chat_prompts import PromptFormat, prompt_formats

# Or copy the pattern into your own code

I/O Contract

first_prompt Inputs

Name	Type	Required	Description
system_prompt	str	Yes	System instruction text to embed in the first turn
user_message	str	Yes	The user's first message

first_prompt Output

Name	Type	Description
formatted_prompt	str	Complete formatted prompt string with system prompt and first user message, ready for tokenization

subs_prompt Inputs

Name	Type	Required	Description
user_message	str	Yes	The user's follow-up message

subs_prompt Output

Name	Type	Description
formatted_prompt	str	Formatted follow-up turn string to append to the conversation

stop_conditions Inputs

Name	Type	Required	Description
tokenizer	ExLlamaV2Tokenizer	Yes	Tokenizer instance for resolving token IDs

stop_conditions Output

Name	Type	Description
conditions	list	List of token IDs (int) and/or stop strings (str) that terminate generation

Usage Examples

Using a Prompt Format for Chat

from examples.chat_prompts import prompt_formats

# Select format by name
fmt = prompt_formats["chatml"]()

# Format first turn
prompt = fmt.first_prompt(
    system_prompt="You are a helpful coding assistant.",
    user_message="Write a Python hello world program.",
)
# Result: "<|im_start|>system\nYou are a helpful coding assistant.<|im_end|>\n
#          <|im_start|>user\nWrite a Python hello world program.<|im_end|>\n
#          <|im_start|>assistant\n"

# Get encoding options
opts = fmt.encoding_options()
# Result: {"add_bos": False, "encode_special_tokens": True}

# Get stop conditions
stops = fmt.stop_conditions(tokenizer)
# Result: [eos_token_id, "<|im_end|>"]

Multi-Turn Conversation

fmt = prompt_formats["llama3"]()

# First turn
conversation = fmt.first_prompt(
    system_prompt=fmt.default_system_prompt(),
    user_message="What is Python?",
)

# Generate response...
response = "Python is a high-level programming language..."

# Append response and format next turn
conversation += response
conversation += fmt.subs_prompt("Can you show me an example?")

# Generate next response...

Listing Available Formats

from examples.chat_prompts import prompt_formats

for name, fmt_class in prompt_formats.items():
    fmt = fmt_class()
    print(f"{name}: {fmt.description}")

Related Pages

Implements Principle

Principle:Turboderp_org_Exllamav2_Prompt_Format_Templating

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment