Implementation:Mlc ai Mlc llm Debug Protocol

Overview

The Debug Protocol module defines debug and disaggregated serving configuration data structures used in MLC LLM. It is located at python/mlc_llm/protocol/debug_protocol.py (49 lines) and provides two Pydantic model classes: DisaggConfig for microserving metadata and DebugConfig for engine-level debug options.

Purpose

This module provides structured configuration for:

Debug options that control engine behavior during development and testing (e.g., ignoring EOS tokens, grammar execution modes)
Disaggregated serving metadata used in microserving APIs to coordinate KV-cache transfer between prefill and decode instances in a distributed inference setup

Debug options are available to the engine internally but are not exposed at the serving endpoint unless the --enable-debug flag is explicitly passed.

Key Components

DisaggConfig Class

A Pydantic BaseModel that carries metadata for disaggregated (microserving) inference APIs:

class DisaggConfig(BaseModel):
    """The class of metadata used in microserving APIs."""

    kind: Optional[Literal["prepare_receive", "remote_send", "start_generation"]] = None
    kv_append_metadata: Optional[str] = None
    kv_window_begin: Optional[int] = None
    kv_window_end: Optional[int] = None
    dst_group_offset: Optional[int] = None

Fields:

Field	Type	Description
`kind`	`Optional[Literal["prepare_receive", "remote_send", "start_generation"]]`	The type of disaggregated request
`kv_append_metadata`	`Optional[str]`	Base64-encoded KV append metadata
`kv_window_begin`	`Optional[int]`	Start index of the KV window of interest
`kv_window_end`	`Optional[int]`	End index of the KV window (supports Python-style negative indexing)
`dst_group_offset`	`Optional[int]`	KV data destination group offset

KV Window Semantics by Kind:

prepare_receive: Begin is always 0; [0:end] denotes the KV range to prefill on a prefill instance
remote_send: [begin:end] denotes the KV range to compute prefill and send to the decode instance
start_generation: End is always None; [begin:] denotes the KV range to prefill locally on the decode instance

DebugConfig Class

A Pydantic BaseModel that defines debug-time engine options:

class DebugConfig(BaseModel):
    ignore_eos: bool = False
    pinned_system_prompt: bool = False
    special_request: Optional[Literal["query_engine_metrics"]] = None
    grammar_execution_mode: Literal["constraint", "jump_forward"] = "jump_forward"
    disagg_config: Optional[DisaggConfig] = None

Fields:

Field	Type	Default	Description
`ignore_eos`	`bool`	`False`	When `True`, the engine will ignore end-of-sequence tokens during generation
`pinned_system_prompt`	`bool`	`False`	Whether to pin the system prompt in memory
`special_request`	`Optional[Literal["query_engine_metrics"]]`	`None`	Triggers a special request that bypasses the normal engine step flow; results are returned in the `usage` field
`grammar_execution_mode`	`Literal["constraint", "jump_forward"]`	`"jump_forward"`	Controls how grammar-guided generation is executed
`disagg_config`	`Optional[DisaggConfig]`	`None`	Optional disaggregated serving configuration

Special Requests

Special requests are handled by the engine differently from normal inference requests. They do not go through the standard engine step flow. The results of special requests are returned as a field of the usage object in the response. Currently, only "query_engine_metrics" is supported.

Dependencies

pydantic -- For data validation via BaseModel
typing -- For Literal and Optional type annotations

File Location

python/mlc_llm/protocol/debug_protocol.py

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment