Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm ConvTemplate

From Leeroopedia


Knowledge Sources
Domains LLM Serving, Conversation Formatting, Prompt Engineering
Last Updated 2026-02-09 19:00 GMT

Overview

ConvTemplate implements conversation template parsing and prompt construction for the MLC LLM JSON FFI layer, converting chat completion requests into model-ready prompt sequences including text, token, and image data.

Description

The conv_template.cc file provides the core machinery for transforming OpenAI-style chat completion requests into formatted prompts suitable for different LLM architectures. It operates within the mlc::llm::json_ffi namespace and consists of several key components:

ModelVisionConfig and ModelConfig are configuration structures that are deserialized from JSON. ModelVisionConfig holds vision-specific parameters such as hidden_size, image_size, patch_size, num_attention_heads, and num_hidden_layers. ModelConfig holds general model parameters including vocab_size, context_window_size, sliding_window_size, prefill_chunk_size, tensor_parallel_shards, pipeline_parallel_stages, max_batch_size, and an optional vision_config. Both provide FromJSON static methods for parsing.

Conversation is the central class representing a conversation template. It manages system templates, role mappings, message separators, stop strings, stop token IDs, and role-specific message templates. The GetSystemText method substitutes the {system_message} placeholder in the system template with the actual system message. The GetRoleText method substitutes role-specific placeholders (e.g., {user_message}, {assistant_message}) and optionally the {function_string} placeholder for function calling. The Conversation::FromJSON method performs comprehensive JSON deserialization of the full conversation template, including messages that can be strings or arrays of content objects.

TryGetFunctionCallingString detects whether function calling is needed based on the request's tools and tool_choice fields. It returns either a single function's serialized JSON or a JSON array of all functions when tool_choice is "auto".

CreatePrompt is the main prompt construction function. It combines the system message, processes all conversation messages (from both the template and the request), handles multimodal content (text and base64-encoded images), applies role templates and separators, and appends a final assistant turn marker. Image data is decoded, validated against the vision config, and converted to ImageData objects with computed embed sizes. The function returns a vector of Data objects (TextData, ImageData, or TokenData) that represent the complete prompt.

Usage

Use ConvTemplate when implementing a JSON FFI endpoint for chat completions. The Conversation is typically loaded from a model's configuration file, and CreatePrompt is called for each incoming ChatCompletionRequest to produce the prompt data that will be fed to the model.

Code Reference

Source Location

Signature

struct ModelVisionConfig {
  static ModelVisionConfig FromJSON(const picojson::object& json_obj);
};

struct ModelConfig {
  static ModelConfig FromJSON(const picojson::object& json_obj);
};

class Conversation {
public:
  Conversation();
  std::string GetSystemText(const std::string& system_msg) const;
  std::string GetRoleText(const std::string& role, const std::string& content,
                          const std::optional<std::string>& fn_call_string) const;
  static Result<Conversation> FromJSON(const picojson::object& json_obj);
  static Result<Conversation> FromJSON(const std::string& json_str);
};

Result<std::optional<std::string>> TryGetFunctionCallingString(
    const Conversation& conv, const ChatCompletionRequest& request);

Result<std::vector<Data>> CreatePrompt(const Conversation& conv,
                                       const ChatCompletionRequest& request,
                                       const ModelConfig& config, DLDevice device);

Import

#include "conv_template.h"

I/O Contract

Inputs

Name Type Required Description
conv Conversation Yes The conversation template with system message, role templates, and separators
request ChatCompletionRequest Yes The incoming chat completion request with messages, tools, and parameters
config ModelConfig Yes Model configuration including vocab size and optional vision config
device DLDevice Yes The target device for tensor allocation (used for image data)
json_obj picojson::object Yes (FromJSON) JSON object to parse for configuration or conversation template

Outputs

Name Type Description
Result<std::vector> Result type On success, a vector of Data objects (TextData, ImageData, TokenData) forming the prompt; on error, an error message
Result<Conversation> Result type On success, a parsed Conversation template; on error, an error message
Result<std::optional<std::string>> Result type On success, an optional serialized function calling string; on error, an error message

Usage Examples

// Parse a conversation template from JSON
Result<Conversation> conv_result = Conversation::FromJSON(json_string);
if (conv_result.IsErr()) {
  LOG(ERROR) << conv_result.UnwrapErr();
  return;
}
Conversation conv = conv_result.Unwrap();

// Parse a chat completion request
Result<ChatCompletionRequest> request_result = ChatCompletionRequest::FromJSON(request_json);
ChatCompletionRequest request = request_result.Unwrap();

// Create a prompt from the conversation and request
ModelConfig config = ModelConfig::FromJSON(config_json);
Result<std::vector<Data>> prompt = CreatePrompt(conv, request, config, device);
if (prompt.IsOk()) {
  std::vector<Data> data = prompt.Unwrap();
  // Feed data to the model
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment