Implementation:Mlc ai Mlc llm Request Header
Overview
The file cpp/serve/request.h defines the Request data type for the MLC-LLM serving engine. A Request represents a user-submitted text-generation task, encapsulating its unique identifier, multi-modal input data, and generation configuration parameters. The Request is intentionally designed as an immutable value object that can be serialized and re-dispatched to another node for distributed serving.
File Location
cpp/serve/request.h
Dependencies
| Header | Purpose |
|---|---|
tvm/ffi/container/array.h |
TVM Array container |
tvm/ffi/reflection/registry.h |
TVM object reflection and registration |
tvm/ffi/string.h |
TVM String type |
tvm/runtime/object.h |
TVM Object base class |
../tokenizers/tokenizers.h |
Tokenizer for converting text to token IDs |
config.h |
GenerationConfig definition |
data.h |
Multi-modal Data type definitions |
Namespace
All types are defined in mlc::llm::serve.
Class: RequestNode
RequestNode is the TVM object node implementing the request data. Despite being marked _type_mutable = true for the TVM type system, the class is conceptually immutable once created, as noted in the header documentation.
Fields
class RequestNode : public Object {
public:
String id;
Array<Data> inputs;
int prompt_tokens = -1;
GenerationConfig generation_cfg;
Object* rstate = nullptr;
// ...
};
| Field | Type | Description |
|---|---|---|
id |
String |
Unique identifier for the request. Different requests must have different IDs. |
inputs |
Array |
The user inputs, which may include multi-modal data (text, images, etc.). See data.h for the Data type definition.
|
prompt_tokens |
int |
The equivalent input sequence length. A value of -1 indicates the length is unknown because untokenized text data exists in the inputs.
|
generation_cfg |
GenerationConfig |
Sampling and generation parameters including temperature, top_p, repetition penalty, max generation length, stop tokens, stop strings, and other controls. |
rstate |
Object* |
A raw pointer providing a backward reference to the request's runtime state. This is a non-owning reference used for efficient lookups from the request back to its associated state. |
TVM Object Registration
static void RegisterReflection() {
namespace refl = tvm::ffi::reflection;
refl::ObjectDef<RequestNode>();
}
static constexpr const bool _type_has_method_sequal_reduce = false;
static constexpr const bool _type_has_method_shash_reduce = false;
static constexpr const bool _type_mutable = true;
TVM_FFI_DECLARE_OBJECT_INFO("mlc.serve.Request", RequestNode, Object);
Registered under type key "mlc.serve.Request". Structural equality and hash are disabled because request identity is determined by the id field rather than by structural comparison.
Class: Request
Request is the managed reference (smart pointer) for RequestNode.
Constructor
explicit Request(String id, Array<Data> inputs, GenerationConfig generation_cfg);
Creates a new request with the given ID, inputs, and generation configuration.
Static Method: FromUntokenized
static Request FromUntokenized(const Request& request, const Tokenizer& tokenizer);
Creates a new Request with all text data converted to token IDs using the provided tokenizer. The new request retains the same id as the original. This method is used during the request ingestion pipeline to transform raw text inputs into the tokenized form required by the model.
After tokenization, the prompt_tokens field will be updated from -1 to the actual token count.
Role in the Serving Pipeline
The Request object flows through the serving engine as follows:
- A user submits a request via the API layer, which constructs a
Requestwith untokenized text inputs. Request::FromUntokenizedtokenizes all text data.- The engine creates a
RequestState(see Request State Implementation) to track runtime state. - The backward reference
rstatelinks the immutable request to its mutable runtime state. - The request's
generation_cfgcontrols logit processing, sampling, and stopping conditions throughout the generation lifecycle.
See Also
- Request State Implementation -- Runtime state management for requests
- Logit Processor Header -- Uses generation config from requests for logit processing
- Sampler Header -- Uses generation config from requests for sampling decisions
- Metrics Header -- RequestMetrics tracks per-request performance data