Implementation:Mlc ai Mlc llm Debug Protocol
Overview
The Debug Protocol module defines debug and disaggregated serving configuration data structures used in MLC LLM. It is located at python/mlc_llm/protocol/debug_protocol.py (49 lines) and provides two Pydantic model classes: DisaggConfig for microserving metadata and DebugConfig for engine-level debug options.
Purpose
This module provides structured configuration for:
- Debug options that control engine behavior during development and testing (e.g., ignoring EOS tokens, grammar execution modes)
- Disaggregated serving metadata used in microserving APIs to coordinate KV-cache transfer between prefill and decode instances in a distributed inference setup
Debug options are available to the engine internally but are not exposed at the serving endpoint unless the --enable-debug flag is explicitly passed.
Key Components
DisaggConfig Class
A Pydantic BaseModel that carries metadata for disaggregated (microserving) inference APIs:
class DisaggConfig(BaseModel):
"""The class of metadata used in microserving APIs."""
kind: Optional[Literal["prepare_receive", "remote_send", "start_generation"]] = None
kv_append_metadata: Optional[str] = None
kv_window_begin: Optional[int] = None
kv_window_end: Optional[int] = None
dst_group_offset: Optional[int] = None
Fields:
| Field | Type | Description |
|---|---|---|
kind |
Optional[Literal["prepare_receive", "remote_send", "start_generation"]] |
The type of disaggregated request |
kv_append_metadata |
Optional[str] |
Base64-encoded KV append metadata |
kv_window_begin |
Optional[int] |
Start index of the KV window of interest |
kv_window_end |
Optional[int] |
End index of the KV window (supports Python-style negative indexing) |
dst_group_offset |
Optional[int] |
KV data destination group offset |
KV Window Semantics by Kind:
prepare_receive: Begin is always 0;[0:end]denotes the KV range to prefill on a prefill instanceremote_send:[begin:end]denotes the KV range to compute prefill and send to the decode instancestart_generation: End is alwaysNone;[begin:]denotes the KV range to prefill locally on the decode instance
DebugConfig Class
A Pydantic BaseModel that defines debug-time engine options:
class DebugConfig(BaseModel):
ignore_eos: bool = False
pinned_system_prompt: bool = False
special_request: Optional[Literal["query_engine_metrics"]] = None
grammar_execution_mode: Literal["constraint", "jump_forward"] = "jump_forward"
disagg_config: Optional[DisaggConfig] = None
Fields:
| Field | Type | Default | Description |
|---|---|---|---|
ignore_eos |
bool |
False |
When True, the engine will ignore end-of-sequence tokens during generation
|
pinned_system_prompt |
bool |
False |
Whether to pin the system prompt in memory |
special_request |
Optional[Literal["query_engine_metrics"]] |
None |
Triggers a special request that bypasses the normal engine step flow; results are returned in the usage field
|
grammar_execution_mode |
Literal["constraint", "jump_forward"] |
"jump_forward" |
Controls how grammar-guided generation is executed |
disagg_config |
Optional[DisaggConfig] |
None |
Optional disaggregated serving configuration |
Special Requests
Special requests are handled by the engine differently from normal inference requests. They do not go through the standard engine step flow. The results of special requests are returned as a field of the usage object in the response. Currently, only "query_engine_metrics" is supported.
Dependencies
pydantic-- For data validation viaBaseModeltyping-- ForLiteralandOptionaltype annotations
File Location
python/mlc_llm/protocol/debug_protocol.py