Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm Serve Data

From Leeroopedia


Knowledge Sources
Domains LLM Serving, Multi-Modal Data, Tokenization
Last Updated 2026-02-09 19:00 GMT

Overview

The Serve Data implementation file provides the concrete implementations for the multi-modal data classes used in the MLC LLM serving pipeline. It includes constructors, embedding computation, TVM FFI registrations, the SplitData utility for splitting data arrays at arbitrary positions, the SampleResult log-probability JSON serialization, and the RequestStreamOutput streaming output construction.

Description

This source file (cpp/serve/data.cc) implements the following components:

  • TVM FFI Reflection Registration: Registers DataNode, TextDataNode, TokenDataNode, ImageDataNode, and RequestStreamOutputObj with the TVM reflection system.
  • SplitData function: Splits an array of Data objects at a specified position. It works backwards through the data array, moving complete data elements when possible, and partially truncating TokenData when the split point falls within a token sequence. Only TokenData supports partial truncation.
  • TextData: Constructor wraps a string. Both GetLength() and GetEmbedding() are unsupported and trigger fatal errors -- text must be tokenized first.
  • TokenData: Constructors accept either an IntTuple or a std::vector<int32_t>. GetLength() returns the number of token IDs. GetEmbedding() delegates to model->TokenEmbed().
  • ImageData: Constructor accepts a Tensor of pixel values and an embed size. GetLength() returns the embed size. GetEmbedding() delegates to model->ImageEmbed().
  • TokenToLogProbJSON: A helper function that serializes a single token-probability pair to JSON format, including the token string (with proper JSON escaping), log probability, and byte representation.
  • SampleResult::GetTokenId: Returns the token ID from the sampled token pair.
  • SampleResult::GetLogProbJSON: Serializes the full sampling result (sampled token and top-probability tokens) to a JSON string conforming to the OpenAI API logprob specification.
  • RequestStreamOutput: Constructor and Usage factory method for streaming output objects. Also includes a TVM FFI-registered "unpack" function that converts the streaming output into an array for cross-language consumption.

Usage

These data classes are used throughout the serving pipeline:

  1. Incoming text is wrapped in TextData, then tokenized into TokenData.
  2. Images are preprocessed and wrapped in ImageData.
  3. The engine calls GetLength() and GetEmbedding() to compute input representations.
  4. SplitData is used during request preemption or chunked processing.
  5. After sampling, SampleResult::GetLogProbJSON serializes log probabilities for streaming output.
  6. RequestStreamOutput carries incremental results back through the callback stream.

Code Reference

Source Location

Property Value
File cpp/serve/data.cc
Namespace mlc::llm::serve
Lines 265
Implements Classes declared in cpp/serve/data.h

Signature

namespace mlc {
namespace llm {
namespace serve {

// Split data array at position
std::pair<Array<Data>, Array<Data>> SplitData(
    const Array<Data>& original_data, int total_length, int split_pos);

// TextData
TextData::TextData(String text);
int TextDataNode::GetLength() const;            // FATAL: not supported
ObjectRef TextDataNode::GetEmbedding(...) const; // FATAL: not supported

// TokenData
TokenData::TokenData(IntTuple token_ids);
TokenData::TokenData(std::vector<int32_t> token_ids);
int TokenDataNode::GetLength() const;
ObjectRef TokenDataNode::GetEmbedding(Model model, ObjectRef* dst, int offset) const;

// ImageData
ImageData::ImageData(Tensor image, int embed_size);
int ImageDataNode::GetLength() const;
ObjectRef ImageDataNode::GetEmbedding(Model model, ObjectRef* dst, int offset) const;

// SampleResult
int32_t SampleResult::GetTokenId() const;
std::string SampleResult::GetLogProbJSON(const Tokenizer& tokenizer, bool logprob) const;

// RequestStreamOutput
RequestStreamOutput::RequestStreamOutput(
    String request_id,
    std::vector<std::vector<int64_t>> group_delta_token_ids,
    std::optional<std::vector<std::vector<String>>> group_delta_logprob_json_strs,
    std::vector<Optional<String>> group_finish_reason,
    std::vector<String> group_extra_prefix_string);
RequestStreamOutput RequestStreamOutput::Usage(
    String request_id, String request_final_usage_json_str);

}  // namespace serve
}  // namespace llm
}  // namespace mlc

Import

#include "serve/data.h"

Dependencies:

  • data.h (the corresponding header)
  • model.h for the Model class used in embedding computation
  • tvm/ffi/function.h and tvm/ffi/reflection/registry.h for FFI registration

I/O Contract

SplitData

Direction Name Type Description
Input original_data const Array& The data array to split
Input total_length int Total token-equivalent length of all data in the array
Input split_pos int Position at which to split (0-indexed from the start)
Output (return).first Array Left portion (positions 0 to split_pos-1)
Output (return).second Array Right portion (positions split_pos to end)

Constraints:

  • split_pos >= 0
  • total_length >= split_pos
  • Only TokenData supports partial truncation; splitting within a TextData or ImageData is not allowed.

SampleResult::GetLogProbJSON

Direction Name Type Description
Input tokenizer const Tokenizer& Tokenizer for token-to-string conversion
Input logprob bool Whether to include log probability information
Output (return) std::string JSON string conforming to OpenAI logprob format, or empty string if logprob is false

RequestStreamOutput Constructor

Direction Name Type Description
Input request_id String Unique identifier for the request
Input group_delta_token_ids std::vector<std::vector<int64_t>> New token IDs per output group since last callback
Input group_delta_logprob_json_strs std::optional<std::vector<std::vector<String>>> Optional logprob JSON strings per group
Input group_finish_reason std::vector<Optional<String>> Finish reason per group (None if not finished)
Input group_extra_prefix_string std::vector<String> Extra prefix strings per group
Output (constructed object) RequestStreamOutput A managed reference to the stream output object

Usage Examples

Creating token data and computing embeddings:

#include "serve/data.h"

// Create token data from a vector of token IDs
std::vector<int32_t> ids = {101, 2054, 2003, 102};
TokenData token_data(ids);
int length = token_data->GetLength();  // Returns 4

// Compute embeddings using the model
ObjectRef embedding = token_data->GetEmbedding(model);

Splitting data for chunked prefill:

Array<Data> data_array = {token_data1, token_data2};
int total_len = token_data1->GetLength() + token_data2->GetLength();
int chunk_size = 512;

auto [first_chunk, remainder] = SplitData(data_array, total_len, chunk_size);

Serializing sample results:

SampleResult result;
result.sampled_token_id = {42, 0.95f};
result.top_prob_tokens = {{42, 0.95f}, {17, 0.03f}, {88, 0.02f}};

std::string logprob_json = result.GetLogProbJSON(tokenizer, /*logprob=*/true);
// Returns JSON with token, logprob, bytes, and top_logprobs array

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment