Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm Debug Protocol

From Leeroopedia


Overview

The Debug Protocol module defines debug and disaggregated serving configuration data structures used in MLC LLM. It is located at python/mlc_llm/protocol/debug_protocol.py (49 lines) and provides two Pydantic model classes: DisaggConfig for microserving metadata and DebugConfig for engine-level debug options.

Purpose

This module provides structured configuration for:

  • Debug options that control engine behavior during development and testing (e.g., ignoring EOS tokens, grammar execution modes)
  • Disaggregated serving metadata used in microserving APIs to coordinate KV-cache transfer between prefill and decode instances in a distributed inference setup

Debug options are available to the engine internally but are not exposed at the serving endpoint unless the --enable-debug flag is explicitly passed.

Key Components

DisaggConfig Class

A Pydantic BaseModel that carries metadata for disaggregated (microserving) inference APIs:

class DisaggConfig(BaseModel):
    """The class of metadata used in microserving APIs."""

    kind: Optional[Literal["prepare_receive", "remote_send", "start_generation"]] = None
    kv_append_metadata: Optional[str] = None
    kv_window_begin: Optional[int] = None
    kv_window_end: Optional[int] = None
    dst_group_offset: Optional[int] = None

Fields:

Field Type Description
kind Optional[Literal["prepare_receive", "remote_send", "start_generation"]] The type of disaggregated request
kv_append_metadata Optional[str] Base64-encoded KV append metadata
kv_window_begin Optional[int] Start index of the KV window of interest
kv_window_end Optional[int] End index of the KV window (supports Python-style negative indexing)
dst_group_offset Optional[int] KV data destination group offset

KV Window Semantics by Kind:

  • prepare_receive: Begin is always 0; [0:end] denotes the KV range to prefill on a prefill instance
  • remote_send: [begin:end] denotes the KV range to compute prefill and send to the decode instance
  • start_generation: End is always None; [begin:] denotes the KV range to prefill locally on the decode instance

DebugConfig Class

A Pydantic BaseModel that defines debug-time engine options:

class DebugConfig(BaseModel):
    ignore_eos: bool = False
    pinned_system_prompt: bool = False
    special_request: Optional[Literal["query_engine_metrics"]] = None
    grammar_execution_mode: Literal["constraint", "jump_forward"] = "jump_forward"
    disagg_config: Optional[DisaggConfig] = None

Fields:

Field Type Default Description
ignore_eos bool False When True, the engine will ignore end-of-sequence tokens during generation
pinned_system_prompt bool False Whether to pin the system prompt in memory
special_request Optional[Literal["query_engine_metrics"]] None Triggers a special request that bypasses the normal engine step flow; results are returned in the usage field
grammar_execution_mode Literal["constraint", "jump_forward"] "jump_forward" Controls how grammar-guided generation is executed
disagg_config Optional[DisaggConfig] None Optional disaggregated serving configuration

Special Requests

Special requests are handled by the engine differently from normal inference requests. They do not go through the standard engine step flow. The results of special requests are returned as a field of the usage object in the response. Currently, only "query_engine_metrics" is supported.

Dependencies

  • pydantic -- For data validation via BaseModel
  • typing -- For Literal and Optional type annotations

File Location

python/mlc_llm/protocol/debug_protocol.py

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment