Implementation:Mlc ai Mlc llm ConvTemplate

Knowledge Sources	Mlc_ai_Mlc_llm
Domains	LLM Serving, Conversation Formatting, Prompt Engineering
Last Updated	2026-02-09 19:00 GMT

Overview

ConvTemplate implements conversation template parsing and prompt construction for the MLC LLM JSON FFI layer, converting chat completion requests into model-ready prompt sequences including text, token, and image data.

Description

The conv_template.cc file provides the core machinery for transforming OpenAI-style chat completion requests into formatted prompts suitable for different LLM architectures. It operates within the mlc::llm::json_ffi namespace and consists of several key components:

ModelVisionConfig and ModelConfig are configuration structures that are deserialized from JSON. ModelVisionConfig holds vision-specific parameters such as hidden_size, image_size, patch_size, num_attention_heads, and num_hidden_layers. ModelConfig holds general model parameters including vocab_size, context_window_size, sliding_window_size, prefill_chunk_size, tensor_parallel_shards, pipeline_parallel_stages, max_batch_size, and an optional vision_config. Both provide FromJSON static methods for parsing.

Conversation is the central class representing a conversation template. It manages system templates, role mappings, message separators, stop strings, stop token IDs, and role-specific message templates. The GetSystemText method substitutes the {system_message} placeholder in the system template with the actual system message. The GetRoleText method substitutes role-specific placeholders (e.g., {user_message}, {assistant_message}) and optionally the {function_string} placeholder for function calling. The Conversation::FromJSON method performs comprehensive JSON deserialization of the full conversation template, including messages that can be strings or arrays of content objects.

TryGetFunctionCallingString detects whether function calling is needed based on the request's tools and tool_choice fields. It returns either a single function's serialized JSON or a JSON array of all functions when tool_choice is "auto".

CreatePrompt is the main prompt construction function. It combines the system message, processes all conversation messages (from both the template and the request), handles multimodal content (text and base64-encoded images), applies role templates and separators, and appends a final assistant turn marker. Image data is decoded, validated against the vision config, and converted to ImageData objects with computed embed sizes. The function returns a vector of Data objects (TextData, ImageData, or TokenData) that represent the complete prompt.

Usage

Use ConvTemplate when implementing a JSON FFI endpoint for chat completions. The Conversation is typically loaded from a model's configuration file, and CreatePrompt is called for each incoming ChatCompletionRequest to produce the prompt data that will be fed to the model.

Code Reference

Source Location

Repository: Mlc_ai_Mlc_llm
File: cpp/json_ffi/conv_template.cc
Lines: 1-567

Signature

struct ModelVisionConfig {
  static ModelVisionConfig FromJSON(const picojson::object& json_obj);
};

struct ModelConfig {
  static ModelConfig FromJSON(const picojson::object& json_obj);
};

class Conversation {
public:
  Conversation();
  std::string GetSystemText(const std::string& system_msg) const;
  std::string GetRoleText(const std::string& role, const std::string& content,
                          const std::optional<std::string>& fn_call_string) const;
  static Result<Conversation> FromJSON(const picojson::object& json_obj);
  static Result<Conversation> FromJSON(const std::string& json_str);
};

Result<std::optional<std::string>> TryGetFunctionCallingString(
    const Conversation& conv, const ChatCompletionRequest& request);

Result<std::vector<Data>> CreatePrompt(const Conversation& conv,
                                       const ChatCompletionRequest& request,
                                       const ModelConfig& config, DLDevice device);

Import

#include "conv_template.h"

I/O Contract

Inputs

Name	Type	Required	Description
conv	Conversation	Yes	The conversation template with system message, role templates, and separators
request	ChatCompletionRequest	Yes	The incoming chat completion request with messages, tools, and parameters
config	ModelConfig	Yes	Model configuration including vocab size and optional vision config
device	DLDevice	Yes	The target device for tensor allocation (used for image data)
json_obj	picojson::object	Yes (FromJSON)	JSON object to parse for configuration or conversation template

Outputs

Name	Type	Description
Result<std::vector>	Result type	On success, a vector of Data objects (TextData, ImageData, TokenData) forming the prompt; on error, an error message
Result<Conversation>	Result type	On success, a parsed Conversation template; on error, an error message
Result<std::optional<std::string>>	Result type	On success, an optional serialized function calling string; on error, an error message

Usage Examples

// Parse a conversation template from JSON
Result<Conversation> conv_result = Conversation::FromJSON(json_string);
if (conv_result.IsErr()) {
  LOG(ERROR) << conv_result.UnwrapErr();
  return;
}
Conversation conv = conv_result.Unwrap();

// Parse a chat completion request
Result<ChatCompletionRequest> request_result = ChatCompletionRequest::FromJSON(request_json);
ChatCompletionRequest request = request_result.Unwrap();

// Create a prompt from the conversation and request
ModelConfig config = ModelConfig::FromJSON(config_json);
Result<std::vector<Data>> prompt = CreatePrompt(conv, request, config, device);
if (prompt.IsOk()) {
  std::vector<Data> data = prompt.Unwrap();
  // Feed data to the model
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment