Implementation:Mlc ai Mlc llm ConvTemplate
| Knowledge Sources | |
|---|---|
| Domains | LLM Serving, Conversation Formatting, Prompt Engineering |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
ConvTemplate implements conversation template parsing and prompt construction for the MLC LLM JSON FFI layer, converting chat completion requests into model-ready prompt sequences including text, token, and image data.
Description
The conv_template.cc file provides the core machinery for transforming OpenAI-style chat completion requests into formatted prompts suitable for different LLM architectures. It operates within the mlc::llm::json_ffi namespace and consists of several key components:
ModelVisionConfig and ModelConfig are configuration structures that are deserialized from JSON. ModelVisionConfig holds vision-specific parameters such as hidden_size, image_size, patch_size, num_attention_heads, and num_hidden_layers. ModelConfig holds general model parameters including vocab_size, context_window_size, sliding_window_size, prefill_chunk_size, tensor_parallel_shards, pipeline_parallel_stages, max_batch_size, and an optional vision_config. Both provide FromJSON static methods for parsing.
Conversation is the central class representing a conversation template. It manages system templates, role mappings, message separators, stop strings, stop token IDs, and role-specific message templates. The GetSystemText method substitutes the {system_message} placeholder in the system template with the actual system message. The GetRoleText method substitutes role-specific placeholders (e.g., {user_message}, {assistant_message}) and optionally the {function_string} placeholder for function calling. The Conversation::FromJSON method performs comprehensive JSON deserialization of the full conversation template, including messages that can be strings or arrays of content objects.
TryGetFunctionCallingString detects whether function calling is needed based on the request's tools and tool_choice fields. It returns either a single function's serialized JSON or a JSON array of all functions when tool_choice is "auto".
CreatePrompt is the main prompt construction function. It combines the system message, processes all conversation messages (from both the template and the request), handles multimodal content (text and base64-encoded images), applies role templates and separators, and appends a final assistant turn marker. Image data is decoded, validated against the vision config, and converted to ImageData objects with computed embed sizes. The function returns a vector of Data objects (TextData, ImageData, or TokenData) that represent the complete prompt.
Usage
Use ConvTemplate when implementing a JSON FFI endpoint for chat completions. The Conversation is typically loaded from a model's configuration file, and CreatePrompt is called for each incoming ChatCompletionRequest to produce the prompt data that will be fed to the model.
Code Reference
Source Location
- Repository: Mlc_ai_Mlc_llm
- File: cpp/json_ffi/conv_template.cc
- Lines: 1-567
Signature
struct ModelVisionConfig {
static ModelVisionConfig FromJSON(const picojson::object& json_obj);
};
struct ModelConfig {
static ModelConfig FromJSON(const picojson::object& json_obj);
};
class Conversation {
public:
Conversation();
std::string GetSystemText(const std::string& system_msg) const;
std::string GetRoleText(const std::string& role, const std::string& content,
const std::optional<std::string>& fn_call_string) const;
static Result<Conversation> FromJSON(const picojson::object& json_obj);
static Result<Conversation> FromJSON(const std::string& json_str);
};
Result<std::optional<std::string>> TryGetFunctionCallingString(
const Conversation& conv, const ChatCompletionRequest& request);
Result<std::vector<Data>> CreatePrompt(const Conversation& conv,
const ChatCompletionRequest& request,
const ModelConfig& config, DLDevice device);
Import
#include "conv_template.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| conv | Conversation | Yes | The conversation template with system message, role templates, and separators |
| request | ChatCompletionRequest | Yes | The incoming chat completion request with messages, tools, and parameters |
| config | ModelConfig | Yes | Model configuration including vocab size and optional vision config |
| device | DLDevice | Yes | The target device for tensor allocation (used for image data) |
| json_obj | picojson::object | Yes (FromJSON) | JSON object to parse for configuration or conversation template |
Outputs
| Name | Type | Description |
|---|---|---|
| Result<std::vector> | Result type | On success, a vector of Data objects (TextData, ImageData, TokenData) forming the prompt; on error, an error message |
| Result<Conversation> | Result type | On success, a parsed Conversation template; on error, an error message |
| Result<std::optional<std::string>> | Result type | On success, an optional serialized function calling string; on error, an error message |
Usage Examples
// Parse a conversation template from JSON
Result<Conversation> conv_result = Conversation::FromJSON(json_string);
if (conv_result.IsErr()) {
LOG(ERROR) << conv_result.UnwrapErr();
return;
}
Conversation conv = conv_result.Unwrap();
// Parse a chat completion request
Result<ChatCompletionRequest> request_result = ChatCompletionRequest::FromJSON(request_json);
ChatCompletionRequest request = request_result.Unwrap();
// Create a prompt from the conversation and request
ModelConfig config = ModelConfig::FromJSON(config_json);
Result<std::vector<Data>> prompt = CreatePrompt(conv, request, config, device);
if (prompt.IsOk()) {
std::vector<Data> data = prompt.Unwrap();
// Feed data to the model
}