Implementation:Mlc ai Mlc llm Llama Templates

Overview

The Llama Templates module defines conversation templates for the Meta Llama family of models within MLC LLM. Located at python/mlc_llm/conversation_template/llama.py, this file registers six conversation templates covering multiple generations and variants of Llama models: llama-4, llama-3_1, llama-3, llama-2, codellama_completion, and codellama_instruct. Each template captures the distinct prompt formatting conventions, special tokens, and generation stop conditions for its corresponding model.

Purpose

Different Llama model generations use fundamentally different prompt formatting schemes. Llama 2 uses the [INST] and <<SYS>> tags, Llama 3 introduced header-based tokens, Llama 3.1 added new stop tokens, and Llama 4 changed the naming convention of those header tokens. This module ensures that each variant is correctly formatted during inference by the MLC LLM engine.

File Location

python/mlc_llm/conversation_template/llama.py

Imports and Dependencies

from mlc_llm.protocol.conversation_protocol import Conversation, MessagePlaceholders
from .registry import ConvTemplateRegistry

Registered Templates

llama-4

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="llama-4",
        system_template="",
        system_message="",
        roles={
            "user": "<|header_start|>user",
            "assistant": "<|header_start|>assistant",
            "tool": "<|header_start|>ipython",
        },
        seps=["<|eot|>"],
        role_content_sep="<|header_end|>\n\n",
        role_empty_sep="<|header_end|>\n\n",
        stop_str=[],
        stop_token_ids=[200001, 200007, 200008],
        system_prefix_token_ids=[200000],
        add_role_after_system_message=False,
    )
)

Key characteristics:

Three roles are defined: user, assistant, and tool (ipython), enabling tool-use capabilities.
stop_token_ids include 200001 (<|end_of_text|>), 200007 (<|eom|>), and 200008 (<|eot|>).
system_prefix_token_ids is [200000] (<|begin_of_text|>).
add_role_after_system_message is set to False.

llama-3_1

Llama 3.1 extends Llama 3 with additional stop token IDs to support end-of-message signaling.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="llama-3_1",
        system_template=(
            "<|start_header_id|>system<|end_header_id|>\n\n"
            f"{MessagePlaceholders.SYSTEM.value}<|eot_id|>"
        ),
        system_message="You are a helpful, respectful and honest assistant.",
        roles={
            "user": "<|start_header_id|>user",
            "assistant": "<|start_header_id|>assistant",
            "tool": "<|start_header_id|>ipython",
        },
        seps=["<|eot_id|>"],
        role_content_sep="<|end_header_id|>\n\n",
        role_empty_sep="<|end_header_id|>\n\n",
        stop_str=[],
        stop_token_ids=[128001, 128008, 128009],
        system_prefix_token_ids=[128000],
        add_role_after_system_message=True,
    )
)

Key differences from Llama 3:

stop_token_ids includes three tokens: 128001 (<|end_of_text|>), 128008 (<|eom_id|>), and 128009 (<|eot_id|>).
stop_str is empty (stop is handled entirely by token IDs).
An additional tool role is defined for function calling support.

llama-3

The Llama 3 template uses header-based special tokens, a significant departure from Llama 2.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="llama-3",
        system_template=(
            "<|start_header_id|>system<|end_header_id|>\n\n"
            f"{MessagePlaceholders.SYSTEM.value}<|eot_id|>"
        ),
        system_message="You are a helpful, respectful and honest assistant.",
        roles={
            "user": "<|start_header_id|>user",
            "assistant": "<|start_header_id|>assistant",
        },
        seps=["<|eot_id|>"],
        role_content_sep="<|end_header_id|>\n\n",
        role_empty_sep="<|end_header_id|>\n\n",
        stop_str=["<|end_of_text|>", "<|eot_id|>"],
        stop_token_ids=[128001, 128009],
        system_prefix_token_ids=[128000],
        add_role_after_system_message=True,
    )
)

Key characteristics:

Uses <|start_header_id|> / <|end_header_id|> tokens for role demarcation.
stop_str includes both the text forms of stop markers.
Only user and assistant roles (no tool role).

llama-2

The Llama 2 template follows the original Meta instruction format with [INST] and <<SYS>> markers.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="llama-2",
        system_template=f"[INST] <<SYS>>\n{MessagePlaceholders.SYSTEM.value}\n<</SYS>>\n\n",
        system_message="You are a helpful, respectful and honest assistant.",
        roles={"user": "<s>[INST]", "assistant": "[/INST]", "tool": "[INST]"},
        seps=[" ", " </s>"],
        role_content_sep=" ",
        role_empty_sep=" ",
        stop_str=["[INST]"],
        stop_token_ids=[2],
        system_prefix_token_ids=[1],
        add_role_after_system_message=False,
    )
)

The Llama 2 format wraps the system prompt inside <<SYS>> / <</SYS>> tags, which are themselves nested within the first [INST] block.

codellama_completion

A minimal template for CodeLlama in pure completion mode (no instruction following).

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="codellama_completion",
        system_template=f"{MessagePlaceholders.SYSTEM.value}",
        system_message="",
        roles={"user": "", "assistant": ""},
        seps=[""],
        role_content_sep="",
        role_empty_sep="",
        stop_str=["</s>"],
        stop_token_ids=[2],
        system_prefix_token_ids=[1],
    )
)

Both roles are set to empty strings, meaning no role markers are injected. Input is passed through directly as a continuation prompt.

codellama_instruct

The instruction-following variant of CodeLlama uses the same [INST] / [/INST] structure as Llama 2 but without a system message wrapper.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="codellama_instruct",
        system_template=f"{MessagePlaceholders.SYSTEM.value}",
        system_message="",
        roles={"user": "[INST]", "assistant": "[/INST]"},
        seps=[" "],
        role_content_sep=" ",
        role_empty_sep=" ",
        stop_str=["</s>"],
        stop_token_ids=[2],
        system_prefix_token_ids=[1],
    )
)

Template Evolution Summary

Template	Token Style	System Message	Tool Role	Stop Token IDs	BOS Token ID
llama-4	header_start\|>	(empty)	Yes	`[200001, 200007, 200008]`	`200000`
llama-3_1	start_header_id\|>	Helpful assistant	Yes	`[128001, 128008, 128009]`	`128000`
llama-3	start_header_id\|>	Helpful assistant	No	`[128001, 128009]`	`128000`
llama-2	`[INST]` / `<<SYS>>`	Helpful assistant	Yes	`[2]`	`1`
codellama_completion	(none)	(empty)	No	`[2]`	`1`
codellama_instruct	`[INST]`	(empty)	No	`[2]`	`1`

Relationship to Other Modules

Template Registry -- All templates are registered with ConvTemplateRegistry.register_conv_template().
Conversation Protocol -- Templates instantiate the Conversation dataclass from mlc_llm.protocol.conversation_protocol.
The reference documentation for Llama 3 formatting is linked in the source code: Meta Llama3 README and tokenizer.py.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment