Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm Llama Templates

From Leeroopedia


Overview

The Llama Templates module defines conversation templates for the Meta Llama family of models within MLC LLM. Located at python/mlc_llm/conversation_template/llama.py, this file registers six conversation templates covering multiple generations and variants of Llama models: llama-4, llama-3_1, llama-3, llama-2, codellama_completion, and codellama_instruct. Each template captures the distinct prompt formatting conventions, special tokens, and generation stop conditions for its corresponding model.

Purpose

Different Llama model generations use fundamentally different prompt formatting schemes. Llama 2 uses the [INST] and <<SYS>> tags, Llama 3 introduced header-based tokens, Llama 3.1 added new stop tokens, and Llama 4 changed the naming convention of those header tokens. This module ensures that each variant is correctly formatted during inference by the MLC LLM engine.

File Location

python/mlc_llm/conversation_template/llama.py

Imports and Dependencies

from mlc_llm.protocol.conversation_protocol import Conversation, MessagePlaceholders
from .registry import ConvTemplateRegistry

Registered Templates

llama-4

The Llama 4 template uses a slightly renamed token scheme compared to Llama 3.1, with <|header_start|> and <|header_end|> instead of <|start_header_id|> and <|end_header_id|>.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="llama-4",
        system_template="",
        system_message="",
        roles={
            "user": "<|header_start|>user",
            "assistant": "<|header_start|>assistant",
            "tool": "<|header_start|>ipython",
        },
        seps=["<|eot|>"],
        role_content_sep="<|header_end|>\n\n",
        role_empty_sep="<|header_end|>\n\n",
        stop_str=[],
        stop_token_ids=[200001, 200007, 200008],
        system_prefix_token_ids=[200000],
        add_role_after_system_message=False,
    )
)

Key characteristics:

  • Three roles are defined: user, assistant, and tool (ipython), enabling tool-use capabilities.
  • stop_token_ids include 200001 (<|end_of_text|>), 200007 (<|eom|>), and 200008 (<|eot|>).
  • system_prefix_token_ids is [200000] (<|begin_of_text|>).
  • add_role_after_system_message is set to False.

llama-3_1

Llama 3.1 extends Llama 3 with additional stop token IDs to support end-of-message signaling.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="llama-3_1",
        system_template=(
            "<|start_header_id|>system<|end_header_id|>\n\n"
            f"{MessagePlaceholders.SYSTEM.value}<|eot_id|>"
        ),
        system_message="You are a helpful, respectful and honest assistant.",
        roles={
            "user": "<|start_header_id|>user",
            "assistant": "<|start_header_id|>assistant",
            "tool": "<|start_header_id|>ipython",
        },
        seps=["<|eot_id|>"],
        role_content_sep="<|end_header_id|>\n\n",
        role_empty_sep="<|end_header_id|>\n\n",
        stop_str=[],
        stop_token_ids=[128001, 128008, 128009],
        system_prefix_token_ids=[128000],
        add_role_after_system_message=True,
    )
)

Key differences from Llama 3:

  • stop_token_ids includes three tokens: 128001 (<|end_of_text|>), 128008 (<|eom_id|>), and 128009 (<|eot_id|>).
  • stop_str is empty (stop is handled entirely by token IDs).
  • An additional tool role is defined for function calling support.

llama-3

The Llama 3 template uses header-based special tokens, a significant departure from Llama 2.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="llama-3",
        system_template=(
            "<|start_header_id|>system<|end_header_id|>\n\n"
            f"{MessagePlaceholders.SYSTEM.value}<|eot_id|>"
        ),
        system_message="You are a helpful, respectful and honest assistant.",
        roles={
            "user": "<|start_header_id|>user",
            "assistant": "<|start_header_id|>assistant",
        },
        seps=["<|eot_id|>"],
        role_content_sep="<|end_header_id|>\n\n",
        role_empty_sep="<|end_header_id|>\n\n",
        stop_str=["<|end_of_text|>", "<|eot_id|>"],
        stop_token_ids=[128001, 128009],
        system_prefix_token_ids=[128000],
        add_role_after_system_message=True,
    )
)

Key characteristics:

  • Uses <|start_header_id|> / <|end_header_id|> tokens for role demarcation.
  • stop_str includes both the text forms of stop markers.
  • Only user and assistant roles (no tool role).

llama-2

The Llama 2 template follows the original Meta instruction format with [INST] and <<SYS>> markers.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="llama-2",
        system_template=f"[INST] <<SYS>>\n{MessagePlaceholders.SYSTEM.value}\n<</SYS>>\n\n",
        system_message="You are a helpful, respectful and honest assistant.",
        roles={"user": "<s>[INST]", "assistant": "[/INST]", "tool": "[INST]"},
        seps=[" ", " </s>"],
        role_content_sep=" ",
        role_empty_sep=" ",
        stop_str=["[INST]"],
        stop_token_ids=[2],
        system_prefix_token_ids=[1],
        add_role_after_system_message=False,
    )
)

The Llama 2 format wraps the system prompt inside <<SYS>> / <</SYS>> tags, which are themselves nested within the first [INST] block.

codellama_completion

A minimal template for CodeLlama in pure completion mode (no instruction following).

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="codellama_completion",
        system_template=f"{MessagePlaceholders.SYSTEM.value}",
        system_message="",
        roles={"user": "", "assistant": ""},
        seps=[""],
        role_content_sep="",
        role_empty_sep="",
        stop_str=["</s>"],
        stop_token_ids=[2],
        system_prefix_token_ids=[1],
    )
)

Both roles are set to empty strings, meaning no role markers are injected. Input is passed through directly as a continuation prompt.

codellama_instruct

The instruction-following variant of CodeLlama uses the same [INST] / [/INST] structure as Llama 2 but without a system message wrapper.

ConvTemplateRegistry.register_conv_template(
    Conversation(
        name="codellama_instruct",
        system_template=f"{MessagePlaceholders.SYSTEM.value}",
        system_message="",
        roles={"user": "[INST]", "assistant": "[/INST]"},
        seps=[" "],
        role_content_sep=" ",
        role_empty_sep=" ",
        stop_str=["</s>"],
        stop_token_ids=[2],
        system_prefix_token_ids=[1],
    )
)

Template Evolution Summary

Template Token Style System Message Tool Role Stop Token IDs BOS Token ID
llama-4 header_start|> (empty) Yes [200001, 200007, 200008] 200000
llama-3_1 start_header_id|> Helpful assistant Yes [128001, 128008, 128009] 128000
llama-3 start_header_id|> Helpful assistant No [128001, 128009] 128000
llama-2 [INST] / <<SYS>> Helpful assistant Yes [2] 1
codellama_completion (none) (empty) No [2] 1
codellama_instruct [INST] (empty) No [2] 1

Relationship to Other Modules

  • Template Registry -- All templates are registered with ConvTemplateRegistry.register_conv_template().
  • Conversation Protocol -- Templates instantiate the Conversation dataclass from mlc_llm.protocol.conversation_protocol.
  • The reference documentation for Llama 3 formatting is linked in the source code: Meta Llama3 README and tokenizer.py.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment