Implementation:Mlc ai Mlc llm Serve Data

Knowledge Sources	Mlc_ai_Mlc_llm
Domains	LLM Serving, Multi-Modal Data, Tokenization
Last Updated	2026-02-09 19:00 GMT

Overview

The Serve Data implementation file provides the concrete implementations for the multi-modal data classes used in the MLC LLM serving pipeline. It includes constructors, embedding computation, TVM FFI registrations, the SplitData utility for splitting data arrays at arbitrary positions, the SampleResult log-probability JSON serialization, and the RequestStreamOutput streaming output construction.

Description

This source file (cpp/serve/data.cc) implements the following components:

TVM FFI Reflection Registration: Registers DataNode, TextDataNode, TokenDataNode, ImageDataNode, and RequestStreamOutputObj with the TVM reflection system.

SplitData function: Splits an array of Data objects at a specified position. It works backwards through the data array, moving complete data elements when possible, and partially truncating TokenData when the split point falls within a token sequence. Only TokenData supports partial truncation.

TextData: Constructor wraps a string. Both GetLength() and GetEmbedding() are unsupported and trigger fatal errors -- text must be tokenized first.

TokenData: Constructors accept either an IntTuple or a std::vector<int32_t>. GetLength() returns the number of token IDs. GetEmbedding() delegates to model->TokenEmbed().

ImageData: Constructor accepts a Tensor of pixel values and an embed size. GetLength() returns the embed size. GetEmbedding() delegates to model->ImageEmbed().

TokenToLogProbJSON: A helper function that serializes a single token-probability pair to JSON format, including the token string (with proper JSON escaping), log probability, and byte representation.

SampleResult::GetTokenId: Returns the token ID from the sampled token pair.

SampleResult::GetLogProbJSON: Serializes the full sampling result (sampled token and top-probability tokens) to a JSON string conforming to the OpenAI API logprob specification.

RequestStreamOutput: Constructor and Usage factory method for streaming output objects. Also includes a TVM FFI-registered "unpack" function that converts the streaming output into an array for cross-language consumption.

Usage

These data classes are used throughout the serving pipeline:

Incoming text is wrapped in TextData, then tokenized into TokenData.
Images are preprocessed and wrapped in ImageData.
The engine calls GetLength() and GetEmbedding() to compute input representations.
SplitData is used during request preemption or chunked processing.
After sampling, SampleResult::GetLogProbJSON serializes log probabilities for streaming output.
RequestStreamOutput carries incremental results back through the callback stream.

Code Reference

Source Location

Property	Value
File	`cpp/serve/data.cc`
Namespace	`mlc::llm::serve`
Lines	265
Implements	Classes declared in `cpp/serve/data.h`

Signature

namespace mlc {
namespace llm {
namespace serve {

// Split data array at position
std::pair<Array<Data>, Array<Data>> SplitData(
    const Array<Data>& original_data, int total_length, int split_pos);

// TextData
TextData::TextData(String text);
int TextDataNode::GetLength() const;            // FATAL: not supported
ObjectRef TextDataNode::GetEmbedding(...) const; // FATAL: not supported

// TokenData
TokenData::TokenData(IntTuple token_ids);
TokenData::TokenData(std::vector<int32_t> token_ids);
int TokenDataNode::GetLength() const;
ObjectRef TokenDataNode::GetEmbedding(Model model, ObjectRef* dst, int offset) const;

// ImageData
ImageData::ImageData(Tensor image, int embed_size);
int ImageDataNode::GetLength() const;
ObjectRef ImageDataNode::GetEmbedding(Model model, ObjectRef* dst, int offset) const;

// SampleResult
int32_t SampleResult::GetTokenId() const;
std::string SampleResult::GetLogProbJSON(const Tokenizer& tokenizer, bool logprob) const;

// RequestStreamOutput
RequestStreamOutput::RequestStreamOutput(
    String request_id,
    std::vector<std::vector<int64_t>> group_delta_token_ids,
    std::optional<std::vector<std::vector<String>>> group_delta_logprob_json_strs,
    std::vector<Optional<String>> group_finish_reason,
    std::vector<String> group_extra_prefix_string);
RequestStreamOutput RequestStreamOutput::Usage(
    String request_id, String request_final_usage_json_str);

}  // namespace serve
}  // namespace llm
}  // namespace mlc

Import

#include "serve/data.h"

Dependencies:

data.h (the corresponding header)
model.h for the Model class used in embedding computation
tvm/ffi/function.h and tvm/ffi/reflection/registry.h for FFI registration

I/O Contract

SplitData

Direction	Name	Type	Description
Input	original_data	`const Array&`	The data array to split
Input	total_length	`int`	Total token-equivalent length of all data in the array
Input	split_pos	`int`	Position at which to split (0-indexed from the start)
Output	(return).first	`Array`	Left portion (positions 0 to split_pos-1)
Output	(return).second	`Array`	Right portion (positions split_pos to end)

Constraints:

split_pos >= 0
total_length >= split_pos
Only TokenData supports partial truncation; splitting within a TextData or ImageData is not allowed.

SampleResult::GetLogProbJSON

Direction	Name	Type	Description
Input	tokenizer	`const Tokenizer&`	Tokenizer for token-to-string conversion
Input	logprob	`bool`	Whether to include log probability information
Output	(return)	`std::string`	JSON string conforming to OpenAI logprob format, or empty string if logprob is false

RequestStreamOutput Constructor

Direction	Name	Type	Description
Input	request_id	`String`	Unique identifier for the request
Input	group_delta_token_ids	`std::vector<std::vector<int64_t>>`	New token IDs per output group since last callback
Input	group_delta_logprob_json_strs	`std::optional<std::vector<std::vector<String>>>`	Optional logprob JSON strings per group
Input	group_finish_reason	`std::vector<Optional<String>>`	Finish reason per group (None if not finished)
Input	group_extra_prefix_string	`std::vector<String>`	Extra prefix strings per group
Output	(constructed object)	`RequestStreamOutput`	A managed reference to the stream output object

Usage Examples

Creating token data and computing embeddings:

#include "serve/data.h"

// Create token data from a vector of token IDs
std::vector<int32_t> ids = {101, 2054, 2003, 102};
TokenData token_data(ids);
int length = token_data->GetLength();  // Returns 4

// Compute embeddings using the model
ObjectRef embedding = token_data->GetEmbedding(model);

Splitting data for chunked prefill:

Array<Data> data_array = {token_data1, token_data2};
int total_len = token_data1->GetLength() + token_data2->GetLength();
int chunk_size = 512;

auto [first_chunk, remainder] = SplitData(data_array, total_len, chunk_size);

Serializing sample results:

SampleResult result;
result.sampled_token_id = {42, 0.95f};
result.top_prob_tokens = {{42, 0.95f}, {17, 0.03f}, {88, 0.02f}};

std::string logprob_json = result.GetLogProbJSON(tokenizer, /*logprob=*/true);
// Returns JSON with token, logprob, bytes, and top_logprobs array

Related Pages

Mlc_ai_Mlc_llm_Serve_Data_Header - The header declaring these data classes
Mlc_ai_Mlc_llm_Engine_Interface - The engine that processes data through the serving pipeline
Mlc_ai_Mlc_llm_Engine_Action - Engine actions that invoke data operations during prefill and decode
Mlc_ai_Mlc_llm_OpenAI_API_Protocol_Header - The API protocol that consumes logprob JSON output

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment