Implementation:Mlc ai Mlc llm Engine Interface
| Knowledge Sources | |
|---|---|
| Domains | LLM Serving, Engine Architecture, Request Management |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
The Engine Interface defines the abstract base class for the MLC LLM serving engine. It establishes the contract for request-based text generation, supporting single or multiple LLM models (for speculative inference). The engine manages request lifecycles from submission through generation to completion, returning results via a callback stream function.
Description
This header file (cpp/serve/engine.h) declares the Engine class as a pure abstract interface organized into three categories of operations:
Engine Management:
Create-- Static factory method that constructs an engine from a JSON config string, device specification, callback function, and optional trace recorder. Returns aResult<EngineCreationOutput>containing the engine instance, the completed engine config, and default generation config.Reset-- Clears all running data and resets metrics.Empty-- Checks whether the engine has any pending requests.GetRequestStreamCallback/SetRequestStreamCallback-- Accessor and mutator for the stream callback function.
Request Management:
AddRequest-- Submits a newRequestto the engine's queue.AbortRequest-- Cancels a specific request by its ID string.AbortAllRequests-- Cancels all pending and running requests.
Engine Action:
Step-- The core function driving the engine's main loop. Each invocation may perform prefill for new requests, decode for running requests, or other actions. After certain actions (e.g., decode), the engine checks for finished requests and delivers results via the callback.
Debug/Profile:
JSONMetrics-- Returns internal engine metrics as a JSON string.DebugCallFuncOnAllAllWorker-- Invokes a named global function on all workers (debug only).
The header also defines:
EngineCreationOutput: A struct bundling the created engine (asstd::unique_ptr<Engine>), the completedEngineConfig, and the defaultGenerationConfig.AbortRequestImpl: A free function implementing request abortion logic that can be called from engine action implementations.
Usage
The engine is the central orchestrator of the MLC LLM serving system:
- An
Engineis created viaEngine::Create()with a JSON config specifying models, devices, and serving parameters. - Clients submit requests via
AddRequest(). - A main loop repeatedly calls
Step(), which drives prefill, decode, and other actions. - Results flow back through the
FRequestStreamCallbackasRequestStreamOutputobjects. - Requests can be aborted individually or collectively at any time.
Code Reference
Source Location
| Property | Value |
|---|---|
| File | cpp/serve/engine.h
|
| Namespace | mlc::llm::serve
|
| Lines | 126 |
| Include Guard | MLC_LLM_SERVE_ENGINE_H_
|
Signature
namespace mlc {
namespace llm {
namespace serve {
struct EngineCreationOutput {
std::unique_ptr<Engine> reloaded_engine;
EngineConfig completed_engine_config;
GenerationConfig default_generation_cfg;
};
class Engine {
public:
virtual ~Engine() = default;
// Engine Management
static Result<EngineCreationOutput> Create(
const std::string& engine_config_json_str,
Device device,
FRequestStreamCallback request_stream_callback,
Optional<EventTraceRecorder> trace_recorder);
virtual void Reset() = 0;
virtual bool Empty() = 0;
virtual FRequestStreamCallback GetRequestStreamCallback() = 0;
virtual void SetRequestStreamCallback(FRequestStreamCallback request_stream_callback) = 0;
// Request Management
virtual void AddRequest(Request request) = 0;
virtual void AbortRequest(const String& request_id) = 0;
virtual void AbortAllRequests() = 0;
// Engine Action
virtual void Step() = 0;
// Debug/Profile
virtual String JSONMetrics() = 0;
virtual void DebugCallFuncOnAllAllWorker(const String& func_name,
Optional<String> func_args) = 0;
};
void AbortRequestImpl(EngineState estate, const Array<Model>& models,
const String& request_id, String finish_reason = "abort");
} // namespace serve
} // namespace llm
} // namespace mlc
Import
#include "serve/engine.h"
Dependencies:
data.hfor serving data types andRequestStreamOutputengine_state.hforEngineStateandEngineConfigevent_trace_recorder.hforEventTraceRecorderrequest.hfor theRequestclassrequest_state.hfor request state tracking
I/O Contract
Engine::Create
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | engine_config_json_str | const std::string& |
JSON string specifying models, device, KV cache size, etc. |
| Input | device | Device |
The device to run models on (e.g., GPU) |
| Input | request_stream_callback | FRequestStreamCallback |
Callback function for streaming generation results |
| Input | trace_recorder | Optional<EventTraceRecorder> |
Optional event trace recorder for profiling |
| Output | (return) | Result<EngineCreationOutput> |
The engine, completed config, and default generation config, or an error |
Engine::AddRequest
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | request | Request |
A request object containing input data, generation config, and request ID |
Engine::Step
| Direction | Name | Type | Description |
|---|---|---|---|
| Side Effect | (callback) | FRequestStreamCallback |
May invoke the stream callback with RequestStreamOutput for finished or in-progress requests
|
AbortRequestImpl
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | estate | EngineState |
The current engine state to modify |
| Input | models | const Array<Model>& |
Models whose KV caches may need cleanup |
| Input | request_id | const String& |
ID of the request to abort |
| Input | finish_reason | String |
Reason for abortion (default: "abort") |
Usage Examples
Creating and running an engine:
#include "serve/engine.h"
// Define the stream callback
auto callback = [](RequestStreamOutput output) {
// Process incremental generation results
LOG(INFO) << "Got output for request: " << output->request_id;
};
// Create the engine
std::string config_json = R"({"model": "dist/llama-2-7b-q4f16_1", ...})";
Result<EngineCreationOutput> result = Engine::Create(
config_json, device, callback, Optional<EventTraceRecorder>());
if (result.IsOk()) {
auto output = result.Unwrap();
auto& engine = output.reloaded_engine;
// Add a request
Request request = /* construct request */;
engine->AddRequest(request);
// Main serving loop
while (!engine->Empty()) {
engine->Step();
}
}
Aborting a request:
// Abort a specific request
engine->AbortRequest("req-12345");
// Abort all requests (e.g., during shutdown)
engine->AbortAllRequests();
Related Pages
- Mlc_ai_Mlc_llm_Engine_Action - The action abstraction that implements
Stepbehavior - Mlc_ai_Mlc_llm_Serve_Data_Header - Data types processed by the engine
- Mlc_ai_Mlc_llm_Serve_Data - Data implementation used in the serving pipeline
- Mlc_ai_Mlc_llm_OpenAI_API_Protocol_Header - API protocol for requests and responses
- Mlc_ai_Mlc_llm_Draft_Token_Workspace - Workspace manager used in speculative decoding mode
- Mlc_ai_Mlc_llm_Model_Metadata_Header - Model metadata consulted during engine creation