Implementation:Mlc ai Mlc llm Request Header

Overview

The file cpp/serve/request.h defines the Request data type for the MLC-LLM serving engine. A Request represents a user-submitted text-generation task, encapsulating its unique identifier, multi-modal input data, and generation configuration parameters. The Request is intentionally designed as an immutable value object that can be serialized and re-dispatched to another node for distributed serving.

File Location

cpp/serve/request.h

Dependencies

Header	Purpose
`tvm/ffi/container/array.h`	TVM Array container
`tvm/ffi/reflection/registry.h`	TVM object reflection and registration
`tvm/ffi/string.h`	TVM String type
`tvm/runtime/object.h`	TVM Object base class
`../tokenizers/tokenizers.h`	Tokenizer for converting text to token IDs
`config.h`	GenerationConfig definition
`data.h`	Multi-modal Data type definitions

Namespace

All types are defined in mlc::llm::serve.

Class: RequestNode

RequestNode is the TVM object node implementing the request data. Despite being marked _type_mutable = true for the TVM type system, the class is conceptually immutable once created, as noted in the header documentation.

Fields

class RequestNode : public Object {
 public:
  String id;
  Array<Data> inputs;
  int prompt_tokens = -1;
  GenerationConfig generation_cfg;
  Object* rstate = nullptr;
  // ...
};

Field	Type	Description
`id`	`String`	Unique identifier for the request. Different requests must have different IDs.
`inputs`	`Array`	The user inputs, which may include multi-modal data (text, images, etc.). See `data.h` for the `Data` type definition.
`prompt_tokens`	`int`	The equivalent input sequence length. A value of `-1` indicates the length is unknown because untokenized text data exists in the inputs.
`generation_cfg`	`GenerationConfig`	Sampling and generation parameters including temperature, top_p, repetition penalty, max generation length, stop tokens, stop strings, and other controls.
`rstate`	`Object*`	A raw pointer providing a backward reference to the request's runtime state. This is a non-owning reference used for efficient lookups from the request back to its associated state.

TVM Object Registration

static void RegisterReflection() {
    namespace refl = tvm::ffi::reflection;
    refl::ObjectDef<RequestNode>();
}

static constexpr const bool _type_has_method_sequal_reduce = false;
static constexpr const bool _type_has_method_shash_reduce = false;
static constexpr const bool _type_mutable = true;
TVM_FFI_DECLARE_OBJECT_INFO("mlc.serve.Request", RequestNode, Object);

Registered under type key "mlc.serve.Request". Structural equality and hash are disabled because request identity is determined by the id field rather than by structural comparison.

Class: Request

Request is the managed reference (smart pointer) for RequestNode.

Constructor

explicit Request(String id, Array<Data> inputs, GenerationConfig generation_cfg);

Creates a new request with the given ID, inputs, and generation configuration.

Static Method: FromUntokenized

static Request FromUntokenized(const Request& request, const Tokenizer& tokenizer);

Creates a new Request with all text data converted to token IDs using the provided tokenizer. The new request retains the same id as the original. This method is used during the request ingestion pipeline to transform raw text inputs into the tokenized form required by the model.

After tokenization, the prompt_tokens field will be updated from -1 to the actual token count.

Role in the Serving Pipeline

The Request object flows through the serving engine as follows:

A user submits a request via the API layer, which constructs a Request with untokenized text inputs.
Request::FromUntokenized tokenizes all text data.
The engine creates a RequestState (see Request State Implementation) to track runtime state.
The backward reference rstate links the immutable request to its mutable runtime state.
The request's generation_cfg controls logit processing, sampling, and stopping conditions throughout the generation lifecycle.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment