Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm Request Header

From Leeroopedia
Revision as of 15:52, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Mlc_ai_Mlc_llm_Request_Header.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Overview

The file cpp/serve/request.h defines the Request data type for the MLC-LLM serving engine. A Request represents a user-submitted text-generation task, encapsulating its unique identifier, multi-modal input data, and generation configuration parameters. The Request is intentionally designed as an immutable value object that can be serialized and re-dispatched to another node for distributed serving.

File Location

cpp/serve/request.h

Dependencies

Header Purpose
tvm/ffi/container/array.h TVM Array container
tvm/ffi/reflection/registry.h TVM object reflection and registration
tvm/ffi/string.h TVM String type
tvm/runtime/object.h TVM Object base class
../tokenizers/tokenizers.h Tokenizer for converting text to token IDs
config.h GenerationConfig definition
data.h Multi-modal Data type definitions

Namespace

All types are defined in mlc::llm::serve.

Class: RequestNode

RequestNode is the TVM object node implementing the request data. Despite being marked _type_mutable = true for the TVM type system, the class is conceptually immutable once created, as noted in the header documentation.

Fields

class RequestNode : public Object {
 public:
  String id;
  Array<Data> inputs;
  int prompt_tokens = -1;
  GenerationConfig generation_cfg;
  Object* rstate = nullptr;
  // ...
};
Field Type Description
id String Unique identifier for the request. Different requests must have different IDs.
inputs Array The user inputs, which may include multi-modal data (text, images, etc.). See data.h for the Data type definition.
prompt_tokens int The equivalent input sequence length. A value of -1 indicates the length is unknown because untokenized text data exists in the inputs.
generation_cfg GenerationConfig Sampling and generation parameters including temperature, top_p, repetition penalty, max generation length, stop tokens, stop strings, and other controls.
rstate Object* A raw pointer providing a backward reference to the request's runtime state. This is a non-owning reference used for efficient lookups from the request back to its associated state.

TVM Object Registration

static void RegisterReflection() {
    namespace refl = tvm::ffi::reflection;
    refl::ObjectDef<RequestNode>();
}

static constexpr const bool _type_has_method_sequal_reduce = false;
static constexpr const bool _type_has_method_shash_reduce = false;
static constexpr const bool _type_mutable = true;
TVM_FFI_DECLARE_OBJECT_INFO("mlc.serve.Request", RequestNode, Object);

Registered under type key "mlc.serve.Request". Structural equality and hash are disabled because request identity is determined by the id field rather than by structural comparison.

Class: Request

Request is the managed reference (smart pointer) for RequestNode.

Constructor

explicit Request(String id, Array<Data> inputs, GenerationConfig generation_cfg);

Creates a new request with the given ID, inputs, and generation configuration.

Static Method: FromUntokenized

static Request FromUntokenized(const Request& request, const Tokenizer& tokenizer);

Creates a new Request with all text data converted to token IDs using the provided tokenizer. The new request retains the same id as the original. This method is used during the request ingestion pipeline to transform raw text inputs into the tokenized form required by the model.

After tokenization, the prompt_tokens field will be updated from -1 to the actual token count.

Role in the Serving Pipeline

The Request object flows through the serving engine as follows:

  1. A user submits a request via the API layer, which constructs a Request with untokenized text inputs.
  2. Request::FromUntokenized tokenizes all text data.
  3. The engine creates a RequestState (see Request State Implementation) to track runtime state.
  4. The backward reference rstate links the immutable request to its mutable runtime state.
  5. The request's generation_cfg controls logit processing, sampling, and stopping conditions throughout the generation lifecycle.

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment