Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm Engine Interface

From Leeroopedia


Knowledge Sources
Domains LLM Serving, Engine Architecture, Request Management
Last Updated 2026-02-09 19:00 GMT

Overview

The Engine Interface defines the abstract base class for the MLC LLM serving engine. It establishes the contract for request-based text generation, supporting single or multiple LLM models (for speculative inference). The engine manages request lifecycles from submission through generation to completion, returning results via a callback stream function.

Description

This header file (cpp/serve/engine.h) declares the Engine class as a pure abstract interface organized into three categories of operations:

Engine Management:

  • Create -- Static factory method that constructs an engine from a JSON config string, device specification, callback function, and optional trace recorder. Returns a Result<EngineCreationOutput> containing the engine instance, the completed engine config, and default generation config.
  • Reset -- Clears all running data and resets metrics.
  • Empty -- Checks whether the engine has any pending requests.
  • GetRequestStreamCallback / SetRequestStreamCallback -- Accessor and mutator for the stream callback function.

Request Management:

  • AddRequest -- Submits a new Request to the engine's queue.
  • AbortRequest -- Cancels a specific request by its ID string.
  • AbortAllRequests -- Cancels all pending and running requests.

Engine Action:

  • Step -- The core function driving the engine's main loop. Each invocation may perform prefill for new requests, decode for running requests, or other actions. After certain actions (e.g., decode), the engine checks for finished requests and delivers results via the callback.

Debug/Profile:

  • JSONMetrics -- Returns internal engine metrics as a JSON string.
  • DebugCallFuncOnAllAllWorker -- Invokes a named global function on all workers (debug only).

The header also defines:

  • EngineCreationOutput: A struct bundling the created engine (as std::unique_ptr<Engine>), the completed EngineConfig, and the default GenerationConfig.
  • AbortRequestImpl: A free function implementing request abortion logic that can be called from engine action implementations.

Usage

The engine is the central orchestrator of the MLC LLM serving system:

  1. An Engine is created via Engine::Create() with a JSON config specifying models, devices, and serving parameters.
  2. Clients submit requests via AddRequest().
  3. A main loop repeatedly calls Step(), which drives prefill, decode, and other actions.
  4. Results flow back through the FRequestStreamCallback as RequestStreamOutput objects.
  5. Requests can be aborted individually or collectively at any time.

Code Reference

Source Location

Property Value
File cpp/serve/engine.h
Namespace mlc::llm::serve
Lines 126
Include Guard MLC_LLM_SERVE_ENGINE_H_

Signature

namespace mlc {
namespace llm {
namespace serve {

struct EngineCreationOutput {
  std::unique_ptr<Engine> reloaded_engine;
  EngineConfig completed_engine_config;
  GenerationConfig default_generation_cfg;
};

class Engine {
 public:
  virtual ~Engine() = default;

  // Engine Management
  static Result<EngineCreationOutput> Create(
      const std::string& engine_config_json_str,
      Device device,
      FRequestStreamCallback request_stream_callback,
      Optional<EventTraceRecorder> trace_recorder);

  virtual void Reset() = 0;
  virtual bool Empty() = 0;
  virtual FRequestStreamCallback GetRequestStreamCallback() = 0;
  virtual void SetRequestStreamCallback(FRequestStreamCallback request_stream_callback) = 0;

  // Request Management
  virtual void AddRequest(Request request) = 0;
  virtual void AbortRequest(const String& request_id) = 0;
  virtual void AbortAllRequests() = 0;

  // Engine Action
  virtual void Step() = 0;

  // Debug/Profile
  virtual String JSONMetrics() = 0;
  virtual void DebugCallFuncOnAllAllWorker(const String& func_name,
                                           Optional<String> func_args) = 0;
};

void AbortRequestImpl(EngineState estate, const Array<Model>& models,
                       const String& request_id, String finish_reason = "abort");

}  // namespace serve
}  // namespace llm
}  // namespace mlc

Import

#include "serve/engine.h"

Dependencies:

  • data.h for serving data types and RequestStreamOutput
  • engine_state.h for EngineState and EngineConfig
  • event_trace_recorder.h for EventTraceRecorder
  • request.h for the Request class
  • request_state.h for request state tracking

I/O Contract

Engine::Create

Direction Name Type Description
Input engine_config_json_str const std::string& JSON string specifying models, device, KV cache size, etc.
Input device Device The device to run models on (e.g., GPU)
Input request_stream_callback FRequestStreamCallback Callback function for streaming generation results
Input trace_recorder Optional<EventTraceRecorder> Optional event trace recorder for profiling
Output (return) Result<EngineCreationOutput> The engine, completed config, and default generation config, or an error

Engine::AddRequest

Direction Name Type Description
Input request Request A request object containing input data, generation config, and request ID

Engine::Step

Direction Name Type Description
Side Effect (callback) FRequestStreamCallback May invoke the stream callback with RequestStreamOutput for finished or in-progress requests

AbortRequestImpl

Direction Name Type Description
Input estate EngineState The current engine state to modify
Input models const Array<Model>& Models whose KV caches may need cleanup
Input request_id const String& ID of the request to abort
Input finish_reason String Reason for abortion (default: "abort")

Usage Examples

Creating and running an engine:

#include "serve/engine.h"

// Define the stream callback
auto callback = [](RequestStreamOutput output) {
  // Process incremental generation results
  LOG(INFO) << "Got output for request: " << output->request_id;
};

// Create the engine
std::string config_json = R"({"model": "dist/llama-2-7b-q4f16_1", ...})";
Result<EngineCreationOutput> result = Engine::Create(
    config_json, device, callback, Optional<EventTraceRecorder>());

if (result.IsOk()) {
  auto output = result.Unwrap();
  auto& engine = output.reloaded_engine;

  // Add a request
  Request request = /* construct request */;
  engine->AddRequest(request);

  // Main serving loop
  while (!engine->Empty()) {
    engine->Step();
  }
}

Aborting a request:

// Abort a specific request
engine->AbortRequest("req-12345");

// Abort all requests (e.g., during shutdown)
engine->AbortAllRequests();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment