Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm CLI Model Metadata

From Leeroopedia


Overview

The file python/mlc_llm/cli/model_metadata.py implements a CLI tool for inspecting the metadata embedded in compiled MLC LLM model libraries. It can display the full metadata as JSON or perform a detailed analysis of memory usage, including parameter sizes, temporary buffer requirements, and KV cache costs.

Location

  • Repository: Mlc_ai_Mlc_llm
  • File: python/mlc_llm/cli/model_metadata.py
  • Lines: 197

Key Components

_extract_metadata

def _extract_metadata(model_lib: Path) -> Dict[str, Any]:
    from tvm.runtime import device, load_module
    from tvm.runtime.vm import VirtualMachine

    return json.loads(VirtualMachine(load_module(model_lib), device("cpu"))["_metadata"]())

This function loads the compiled model library using TVM's load_module, instantiates a VirtualMachine on the CPU device, and calls the embedded _metadata function. The returned JSON string is parsed into a Python dictionary. The imports are performed inside the function to avoid loading TVM runtime until needed.

_report_all

def _report_all(metadata: Dict[str, Any]) -> None:

Formats and prints the full metadata as beautified JSON. The function applies special formatting to the "params" list so that each parameter entry is compacted onto a single line, while the rest of the metadata remains indented for readability.

_read_dynamic_shape

def _read_dynamic_shape(shape: List[Union[int, str]], config: Union[Dict, ConfigBase]) -> List[int]:

Resolves dynamic shapes in parameter definitions. When a parameter shape contains string elements (e.g., "vocab_size"), this function looks up the concrete integer value from the model configuration dictionary. It raises:

  • AttributeError if no configuration is provided but dynamic shapes are encountered.
  • KeyError if the dynamic shape key is not found in the configuration.

_compute_memory_usage

def _compute_memory_usage(metadata: Dict[str, Any], config: Union[Dict, ConfigBase]):

Computes two memory quantities from the metadata:

  1. Parameter bytes -- The total memory required for all model parameters, computed as the product of each parameter's shape dimensions multiplied by its data type size (using tvm.runtime.DataType.itemsize).
  2. Temporary function bytes -- The peak temporary buffer memory across all functions, determined by taking the maximum of the memory_usage entries in the metadata.

Returns both values as a tuple (params_bytes, temp_func_bytes).

_report_memory_usage

def _report_memory_usage(metadata: Dict[str, Any], config: Union[Dict, ConfigBase]) -> None:

Generates a detailed memory report including:

  • Total memory usage without KV cache -- Sum of parameter bytes and temporary buffer bytes, reported in megabytes.
  • KV cache size per token -- Computed when the config provides head_dim, num_hidden_layers, and num_key_value_heads, and the metadata includes a quantization field. The formula is:
bytes_per_token = head_dim * num_hidden_layers * num_key_value_heads * dtype_bytes * 2

The factor of 2 accounts for both key and value tensors. The dtype is inferred from the quantization string (f32 = 4 bytes, f16/bf16 = 2 bytes).

  • Total memory with 4K KV cache -- The total memory usage assuming a context window of 4096 tokens.
  • A hint to tweak prefill_chunk_size, context_window_size, and sliding_window_size to reduce memory consumption.

CLI Entry Point

def main():
    parser = ArgumentParser(description="A tool that inspects the metadata of a model lib.")
    parser.add_argument("model_lib", type=Path, help="...")
    parser.add_argument("--mlc-chat-config", type=Path, help="...")
    parser.add_argument("--memory-only", action="store_true", help="...")
    parsed = parser.parse_args()

CLI arguments:

Argument Type Required Description
model_lib (positional) Path Yes Path to the compiled model library (.so or .a).
--mlc-chat-config Path No Path to mlc-chat-config.json. Required only when --memory-only is set and the model library contains dynamic parameter shapes.
--memory-only flag No When set, only memory usage analysis is displayed. Otherwise, the full metadata JSON is printed.

Execution flow:

  1. The metadata is extracted from the model library using _extract_metadata. If extraction fails (e.g., legacy model library format), the error is logged and the tool exits gracefully.
  2. If --mlc-chat-config is provided, the JSON configuration is loaded from disk.
  3. If --memory-only is set, _report_memory_usage is called. Otherwise, _report_all prints the full metadata.

Design Notes

  • The tool gracefully handles legacy model libraries that lack metadata sections by catching all exceptions during extraction and logging an informative error message.
  • Dynamic shape resolution allows the tool to work with model libraries compiled with symbolic dimensions, which is common in MLC LLM's compilation pipeline.
  • The KV cache calculation includes a TODO comment noting that quantized KV caches are not yet supported in the size calculation.
  • The file also supports direct execution via the if __name__ == "__main__" guard.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment