Implementation:Mlc ai Mlc llm Llama Templates
Overview
The Llama Templates module defines conversation templates for the Meta Llama family of models within MLC LLM. Located at python/mlc_llm/conversation_template/llama.py, this file registers six conversation templates covering multiple generations and variants of Llama models: llama-4, llama-3_1, llama-3, llama-2, codellama_completion, and codellama_instruct. Each template captures the distinct prompt formatting conventions, special tokens, and generation stop conditions for its corresponding model.
Purpose
Different Llama model generations use fundamentally different prompt formatting schemes. Llama 2 uses the [INST] and <<SYS>> tags, Llama 3 introduced header-based tokens, Llama 3.1 added new stop tokens, and Llama 4 changed the naming convention of those header tokens. This module ensures that each variant is correctly formatted during inference by the MLC LLM engine.
File Location
python/mlc_llm/conversation_template/llama.py
Imports and Dependencies
from mlc_llm.protocol.conversation_protocol import Conversation, MessagePlaceholders
from .registry import ConvTemplateRegistry
Registered Templates
llama-4
The Llama 4 template uses a slightly renamed token scheme compared to Llama 3.1, with <|header_start|> and <|header_end|> instead of <|start_header_id|> and <|end_header_id|>.
ConvTemplateRegistry.register_conv_template(
Conversation(
name="llama-4",
system_template="",
system_message="",
roles={
"user": "<|header_start|>user",
"assistant": "<|header_start|>assistant",
"tool": "<|header_start|>ipython",
},
seps=["<|eot|>"],
role_content_sep="<|header_end|>\n\n",
role_empty_sep="<|header_end|>\n\n",
stop_str=[],
stop_token_ids=[200001, 200007, 200008],
system_prefix_token_ids=[200000],
add_role_after_system_message=False,
)
)
Key characteristics:
- Three roles are defined: user, assistant, and tool (ipython), enabling tool-use capabilities.
- stop_token_ids include
200001(<|end_of_text|>),200007(<|eom|>), and200008(<|eot|>). - system_prefix_token_ids is
[200000](<|begin_of_text|>). - add_role_after_system_message is set to
False.
llama-3_1
Llama 3.1 extends Llama 3 with additional stop token IDs to support end-of-message signaling.
ConvTemplateRegistry.register_conv_template(
Conversation(
name="llama-3_1",
system_template=(
"<|start_header_id|>system<|end_header_id|>\n\n"
f"{MessagePlaceholders.SYSTEM.value}<|eot_id|>"
),
system_message="You are a helpful, respectful and honest assistant.",
roles={
"user": "<|start_header_id|>user",
"assistant": "<|start_header_id|>assistant",
"tool": "<|start_header_id|>ipython",
},
seps=["<|eot_id|>"],
role_content_sep="<|end_header_id|>\n\n",
role_empty_sep="<|end_header_id|>\n\n",
stop_str=[],
stop_token_ids=[128001, 128008, 128009],
system_prefix_token_ids=[128000],
add_role_after_system_message=True,
)
)
Key differences from Llama 3:
- stop_token_ids includes three tokens:
128001(<|end_of_text|>),128008(<|eom_id|>), and128009(<|eot_id|>). - stop_str is empty (stop is handled entirely by token IDs).
- An additional tool role is defined for function calling support.
llama-3
The Llama 3 template uses header-based special tokens, a significant departure from Llama 2.
ConvTemplateRegistry.register_conv_template(
Conversation(
name="llama-3",
system_template=(
"<|start_header_id|>system<|end_header_id|>\n\n"
f"{MessagePlaceholders.SYSTEM.value}<|eot_id|>"
),
system_message="You are a helpful, respectful and honest assistant.",
roles={
"user": "<|start_header_id|>user",
"assistant": "<|start_header_id|>assistant",
},
seps=["<|eot_id|>"],
role_content_sep="<|end_header_id|>\n\n",
role_empty_sep="<|end_header_id|>\n\n",
stop_str=["<|end_of_text|>", "<|eot_id|>"],
stop_token_ids=[128001, 128009],
system_prefix_token_ids=[128000],
add_role_after_system_message=True,
)
)
Key characteristics:
- Uses
<|start_header_id|>/<|end_header_id|>tokens for role demarcation. - stop_str includes both the text forms of stop markers.
- Only user and assistant roles (no tool role).
llama-2
The Llama 2 template follows the original Meta instruction format with [INST] and <<SYS>> markers.
ConvTemplateRegistry.register_conv_template(
Conversation(
name="llama-2",
system_template=f"[INST] <<SYS>>\n{MessagePlaceholders.SYSTEM.value}\n<</SYS>>\n\n",
system_message="You are a helpful, respectful and honest assistant.",
roles={"user": "<s>[INST]", "assistant": "[/INST]", "tool": "[INST]"},
seps=[" ", " </s>"],
role_content_sep=" ",
role_empty_sep=" ",
stop_str=["[INST]"],
stop_token_ids=[2],
system_prefix_token_ids=[1],
add_role_after_system_message=False,
)
)
The Llama 2 format wraps the system prompt inside <<SYS>> / <</SYS>> tags, which are themselves nested within the first [INST] block.
codellama_completion
A minimal template for CodeLlama in pure completion mode (no instruction following).
ConvTemplateRegistry.register_conv_template(
Conversation(
name="codellama_completion",
system_template=f"{MessagePlaceholders.SYSTEM.value}",
system_message="",
roles={"user": "", "assistant": ""},
seps=[""],
role_content_sep="",
role_empty_sep="",
stop_str=["</s>"],
stop_token_ids=[2],
system_prefix_token_ids=[1],
)
)
Both roles are set to empty strings, meaning no role markers are injected. Input is passed through directly as a continuation prompt.
codellama_instruct
The instruction-following variant of CodeLlama uses the same [INST] / [/INST] structure as Llama 2 but without a system message wrapper.
ConvTemplateRegistry.register_conv_template(
Conversation(
name="codellama_instruct",
system_template=f"{MessagePlaceholders.SYSTEM.value}",
system_message="",
roles={"user": "[INST]", "assistant": "[/INST]"},
seps=[" "],
role_content_sep=" ",
role_empty_sep=" ",
stop_str=["</s>"],
stop_token_ids=[2],
system_prefix_token_ids=[1],
)
)
Template Evolution Summary
| Template | Token Style | System Message | Tool Role | Stop Token IDs | BOS Token ID |
|---|---|---|---|---|---|
| llama-4 | header_start|> | (empty) | Yes | [200001, 200007, 200008] |
200000
|
| llama-3_1 | start_header_id|> | Helpful assistant | Yes | [128001, 128008, 128009] |
128000
|
| llama-3 | start_header_id|> | Helpful assistant | No | [128001, 128009] |
128000
|
| llama-2 | [INST] / <<SYS>> |
Helpful assistant | Yes | [2] |
1
|
| codellama_completion | (none) | (empty) | No | [2] |
1
|
| codellama_instruct | [INST] |
(empty) | No | [2] |
1
|
Relationship to Other Modules
- Template Registry -- All templates are registered with
ConvTemplateRegistry.register_conv_template(). - Conversation Protocol -- Templates instantiate the
Conversationdataclass frommlc_llm.protocol.conversation_protocol. - The reference documentation for Llama 3 formatting is linked in the source code: Meta Llama3 README and tokenizer.py.