Implementation:Mlc ai Mlc llm Serve Data
| Knowledge Sources | |
|---|---|
| Domains | LLM Serving, Multi-Modal Data, Tokenization |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
The Serve Data implementation file provides the concrete implementations for the multi-modal data classes used in the MLC LLM serving pipeline. It includes constructors, embedding computation, TVM FFI registrations, the SplitData utility for splitting data arrays at arbitrary positions, the SampleResult log-probability JSON serialization, and the RequestStreamOutput streaming output construction.
Description
This source file (cpp/serve/data.cc) implements the following components:
- TVM FFI Reflection Registration: Registers
DataNode,TextDataNode,TokenDataNode,ImageDataNode, andRequestStreamOutputObjwith the TVM reflection system.
SplitDatafunction: Splits an array ofDataobjects at a specified position. It works backwards through the data array, moving complete data elements when possible, and partially truncatingTokenDatawhen the split point falls within a token sequence. OnlyTokenDatasupports partial truncation.
TextData: Constructor wraps a string. BothGetLength()andGetEmbedding()are unsupported and trigger fatal errors -- text must be tokenized first.
TokenData: Constructors accept either anIntTupleor astd::vector<int32_t>.GetLength()returns the number of token IDs.GetEmbedding()delegates tomodel->TokenEmbed().
ImageData: Constructor accepts aTensorof pixel values and an embed size.GetLength()returns the embed size.GetEmbedding()delegates tomodel->ImageEmbed().
TokenToLogProbJSON: A helper function that serializes a single token-probability pair to JSON format, including the token string (with proper JSON escaping), log probability, and byte representation.
SampleResult::GetTokenId: Returns the token ID from the sampled token pair.
SampleResult::GetLogProbJSON: Serializes the full sampling result (sampled token and top-probability tokens) to a JSON string conforming to the OpenAI API logprob specification.
RequestStreamOutput: Constructor andUsagefactory method for streaming output objects. Also includes a TVM FFI-registered "unpack" function that converts the streaming output into an array for cross-language consumption.
Usage
These data classes are used throughout the serving pipeline:
- Incoming text is wrapped in
TextData, then tokenized intoTokenData. - Images are preprocessed and wrapped in
ImageData. - The engine calls
GetLength()andGetEmbedding()to compute input representations. SplitDatais used during request preemption or chunked processing.- After sampling,
SampleResult::GetLogProbJSONserializes log probabilities for streaming output. RequestStreamOutputcarries incremental results back through the callback stream.
Code Reference
Source Location
| Property | Value |
|---|---|
| File | cpp/serve/data.cc
|
| Namespace | mlc::llm::serve
|
| Lines | 265 |
| Implements | Classes declared in cpp/serve/data.h
|
Signature
namespace mlc {
namespace llm {
namespace serve {
// Split data array at position
std::pair<Array<Data>, Array<Data>> SplitData(
const Array<Data>& original_data, int total_length, int split_pos);
// TextData
TextData::TextData(String text);
int TextDataNode::GetLength() const; // FATAL: not supported
ObjectRef TextDataNode::GetEmbedding(...) const; // FATAL: not supported
// TokenData
TokenData::TokenData(IntTuple token_ids);
TokenData::TokenData(std::vector<int32_t> token_ids);
int TokenDataNode::GetLength() const;
ObjectRef TokenDataNode::GetEmbedding(Model model, ObjectRef* dst, int offset) const;
// ImageData
ImageData::ImageData(Tensor image, int embed_size);
int ImageDataNode::GetLength() const;
ObjectRef ImageDataNode::GetEmbedding(Model model, ObjectRef* dst, int offset) const;
// SampleResult
int32_t SampleResult::GetTokenId() const;
std::string SampleResult::GetLogProbJSON(const Tokenizer& tokenizer, bool logprob) const;
// RequestStreamOutput
RequestStreamOutput::RequestStreamOutput(
String request_id,
std::vector<std::vector<int64_t>> group_delta_token_ids,
std::optional<std::vector<std::vector<String>>> group_delta_logprob_json_strs,
std::vector<Optional<String>> group_finish_reason,
std::vector<String> group_extra_prefix_string);
RequestStreamOutput RequestStreamOutput::Usage(
String request_id, String request_final_usage_json_str);
} // namespace serve
} // namespace llm
} // namespace mlc
Import
#include "serve/data.h"
Dependencies:
data.h(the corresponding header)model.hfor theModelclass used in embedding computationtvm/ffi/function.handtvm/ffi/reflection/registry.hfor FFI registration
I/O Contract
SplitData
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | original_data | const Array& |
The data array to split |
| Input | total_length | int |
Total token-equivalent length of all data in the array |
| Input | split_pos | int |
Position at which to split (0-indexed from the start) |
| Output | (return).first | Array |
Left portion (positions 0 to split_pos-1) |
| Output | (return).second | Array |
Right portion (positions split_pos to end) |
Constraints:
split_pos >= 0total_length >= split_pos- Only
TokenDatasupports partial truncation; splitting within aTextDataorImageDatais not allowed.
SampleResult::GetLogProbJSON
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | tokenizer | const Tokenizer& |
Tokenizer for token-to-string conversion |
| Input | logprob | bool |
Whether to include log probability information |
| Output | (return) | std::string |
JSON string conforming to OpenAI logprob format, or empty string if logprob is false |
RequestStreamOutput Constructor
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | request_id | String |
Unique identifier for the request |
| Input | group_delta_token_ids | std::vector<std::vector<int64_t>> |
New token IDs per output group since last callback |
| Input | group_delta_logprob_json_strs | std::optional<std::vector<std::vector<String>>> |
Optional logprob JSON strings per group |
| Input | group_finish_reason | std::vector<Optional<String>> |
Finish reason per group (None if not finished) |
| Input | group_extra_prefix_string | std::vector<String> |
Extra prefix strings per group |
| Output | (constructed object) | RequestStreamOutput |
A managed reference to the stream output object |
Usage Examples
Creating token data and computing embeddings:
#include "serve/data.h"
// Create token data from a vector of token IDs
std::vector<int32_t> ids = {101, 2054, 2003, 102};
TokenData token_data(ids);
int length = token_data->GetLength(); // Returns 4
// Compute embeddings using the model
ObjectRef embedding = token_data->GetEmbedding(model);
Splitting data for chunked prefill:
Array<Data> data_array = {token_data1, token_data2};
int total_len = token_data1->GetLength() + token_data2->GetLength();
int chunk_size = 512;
auto [first_chunk, remainder] = SplitData(data_array, total_len, chunk_size);
Serializing sample results:
SampleResult result;
result.sampled_token_id = {42, 0.95f};
result.top_prob_tokens = {{42, 0.95f}, {17, 0.03f}, {88, 0.02f}};
std::string logprob_json = result.GetLogProbJSON(tokenizer, /*logprob=*/true);
// Returns JSON with token, logprob, bytes, and top_logprobs array
Related Pages
- Mlc_ai_Mlc_llm_Serve_Data_Header - The header declaring these data classes
- Mlc_ai_Mlc_llm_Engine_Interface - The engine that processes data through the serving pipeline
- Mlc_ai_Mlc_llm_Engine_Action - Engine actions that invoke data operations during prefill and decode
- Mlc_ai_Mlc_llm_OpenAI_API_Protocol_Header - The API protocol that consumes logprob JSON output