Overview
Concrete pattern for implementing model-specific prompt formatting through a base class and 18+ subclasses, provided as example code by exllamav2.
Description
PromptFormat is a base class defined in the exllamav2 examples that provides a standard interface for model-specific prompt templating. Each model family has a subclass that implements the correct token structure for that model's chat/instruction format.
The base class defines five methods that subclasses must implement:
- default_system_prompt(): Returns the model's default system instruction text.
- first_prompt(system_prompt, user_message): Formats the first conversation turn, typically embedding the system prompt.
- subs_prompt(user_message): Formats subsequent conversation turns.
- stop_conditions(tokenizer): Returns a list of token IDs and/or strings that signal the end of the assistant's response.
- encoding_options(): Returns a dictionary of encoding flags (add_bos, encode_special_tokens, etc.).
Available subclasses include:
| Subclass |
Models |
Format Style
|
| PromptFormat_raw |
Any |
No formatting, raw text
|
| PromptFormat_llama |
Llama 2 |
[INST] / <<SYS>> markers
|
| PromptFormat_llama3 |
Llama 3 |
Header ID tokens
|
| PromptFormat_codellama |
Code Llama |
Code-specific [INST] variant
|
| PromptFormat_chatml |
Qwen, Yi, etc. |
im_start|>/<|im_end|>
|
| PromptFormat_mistral |
Mistral |
[INST] without <<SYS>>
|
| PromptFormat_phi3 |
Phi-3 |
system|>/<|user|>/<|assistant|>
|
| PromptFormat_gemma |
Gemma |
<start_of_turn>/<end_of_turn>
|
| PromptFormat_deepseek |
DeepSeek |
DeepSeek-specific tokens
|
| PromptFormat_cohere |
Command R |
Cohere chat format
|
A prompt_formats dictionary maps format names to their classes for easy lookup.
Note: This is example code, not an installable API. Copy the pattern into your own project or import directly from the examples directory.
Usage
Use PromptFormat subclasses when building chat applications:
- Select the appropriate subclass for your model
- Use first_prompt() for the initial message
- Use subs_prompt() for follow-up messages
- Use stop_conditions() to configure generation termination
- Use encoding_options() to set tokenizer flags
Code Reference
Source Location
- Repository: exllamav2
- File: examples/chat_prompts.py
- Lines: L2-31 (base class), L721-741 (prompt_formats dict)
Base Class Signature
class PromptFormat:
botname: str = "Assistant"
username: str = "User"
description: str = "Undefined prompt format"
def default_system_prompt(self) -> str:
...
def first_prompt(self, system_prompt: str, user_message: str) -> str:
...
def subs_prompt(self, user_message: str) -> str:
...
def stop_conditions(self, tokenizer) -> list:
...
def encoding_options(self) -> dict:
...
ChatML Subclass Example
class PromptFormat_chatml(PromptFormat):
description = "ChatML"
def default_system_prompt(self):
return "You are a helpful assistant."
def first_prompt(self, system_prompt, user_message):
return (
f"<|im_start|>system\n{system_prompt}<|im_end|>\n"
f"<|im_start|>user\n{user_message}<|im_end|>\n"
f"<|im_start|>assistant\n"
)
def subs_prompt(self, user_message):
return (
f"<|im_end|>\n"
f"<|im_start|>user\n{user_message}<|im_end|>\n"
f"<|im_start|>assistant\n"
)
def stop_conditions(self, tokenizer):
return [tokenizer.eos_token_id, "<|im_end|>"]
def encoding_options(self):
return {"add_bos": False, "encode_special_tokens": True}
Import
# From the examples directory (not a pip-installable module):
from examples.chat_prompts import PromptFormat, prompt_formats
# Or copy the pattern into your own code
I/O Contract
first_prompt Inputs
| Name |
Type |
Required |
Description
|
| system_prompt |
str |
Yes |
System instruction text to embed in the first turn
|
| user_message |
str |
Yes |
The user's first message
|
first_prompt Output
| Name |
Type |
Description
|
| formatted_prompt |
str |
Complete formatted prompt string with system prompt and first user message, ready for tokenization
|
subs_prompt Inputs
| Name |
Type |
Required |
Description
|
| user_message |
str |
Yes |
The user's follow-up message
|
subs_prompt Output
| Name |
Type |
Description
|
| formatted_prompt |
str |
Formatted follow-up turn string to append to the conversation
|
stop_conditions Inputs
| Name |
Type |
Required |
Description
|
| tokenizer |
ExLlamaV2Tokenizer |
Yes |
Tokenizer instance for resolving token IDs
|
stop_conditions Output
| Name |
Type |
Description
|
| conditions |
list |
List of token IDs (int) and/or stop strings (str) that terminate generation
|
Usage Examples
Using a Prompt Format for Chat
from examples.chat_prompts import prompt_formats
# Select format by name
fmt = prompt_formats["chatml"]()
# Format first turn
prompt = fmt.first_prompt(
system_prompt="You are a helpful coding assistant.",
user_message="Write a Python hello world program.",
)
# Result: "<|im_start|>system\nYou are a helpful coding assistant.<|im_end|>\n
# <|im_start|>user\nWrite a Python hello world program.<|im_end|>\n
# <|im_start|>assistant\n"
# Get encoding options
opts = fmt.encoding_options()
# Result: {"add_bos": False, "encode_special_tokens": True}
# Get stop conditions
stops = fmt.stop_conditions(tokenizer)
# Result: [eos_token_id, "<|im_end|>"]
Multi-Turn Conversation
fmt = prompt_formats["llama3"]()
# First turn
conversation = fmt.first_prompt(
system_prompt=fmt.default_system_prompt(),
user_message="What is Python?",
)
# Generate response...
response = "Python is a high-level programming language..."
# Append response and format next turn
conversation += response
conversation += fmt.subs_prompt("Can you show me an example?")
# Generate next response...
Listing Available Formats
from examples.chat_prompts import prompt_formats
for name, fmt_class in prompt_formats.items():
fmt = fmt_class()
print(f"{name}: {fmt.description}")
Related Pages
Implements Principle