Implementation:Mlc ai Mlc llm CLI Calibrate

Overview

The file python/mlc_llm/cli/calibrate.py implements the command-line interface for the calibration subcommand of MLC LLM. Calibration is the process of collecting activation statistics from a model by running it on a representative dataset, which is used to inform quantization decisions. This module parses CLI arguments and delegates to the core calibrate() function in mlc_llm.interface.calibrate.

Location

Repository: Mlc_ai_Mlc_llm
File: python/mlc_llm/cli/calibrate.py
Lines: 80

CLI Arguments

The module defines the following command-line arguments:

Argument	Type	Default	Required	Description
`model` (positional)	`str`	--	Yes	The model to calibrate.
`--device`	`str`	`"auto"`	No	The device to deploy the model on.
`--model-lib`	`str`	`None`	No	Path to the compiled model library.
`--output` / `-o`	`str`	--	Yes	Output path for calibration data.
`--dataset`	`str`	--	Yes	Path to the calibration dataset (e.g., ShareGPT format).
`--num-calibration-samples`	`int`	`16`	No	Number of samples to use for calibration.
`--seed`	`int`	`0`	No	Random seed for reproducible sample selection.
`--overrides`	`EngineConfigOverride`	`""`	No	Engine configuration overrides for serving parameters.

Implementation Details

Imports and Dependencies

from mlc_llm.interface.calibrate import calibrate
from mlc_llm.interface.help import HELP
from mlc_llm.support.argparse import ArgumentParser

from .serve import EngineConfigOverride

The module imports:

calibrate from mlc_llm.interface.calibrate -- the core calibration logic.
HELP from mlc_llm.interface.help -- a shared dictionary of help text strings for consistent CLI documentation.
EngineConfigOverride from mlc_llm.cli.serve -- a dataclass that parses engine configuration override strings (e.g., memory utilization, chunk sizes).

Main Function

The main(argv) function constructs the argument parser, parses the arguments, and calls the core calibrate() function:

def main(argv):
    """Main entrypoint for calibration."""
    parser = ArgumentParser("MLC LLM Calibration CLI")
    # ... argument definitions ...
    parsed = parser.parse_args(argv)
    calibrate(
        model=parsed.model,
        device=parsed.device,
        model_lib=parsed.model_lib,
        output=parsed.output,
        dataset=parsed.dataset,
        num_calibration_samples=parsed.num_calibration_samples,
        max_num_sequence=parsed.overrides.max_num_sequence,
        max_total_sequence_length=parsed.overrides.max_total_seq_length,
        prefill_chunk_size=parsed.overrides.prefill_chunk_size,
        max_history_size=parsed.overrides.max_history_size,
        gpu_memory_utilization=parsed.overrides.gpu_memory_utilization,
        seed=parsed.seed,
    )

EngineConfigOverride Integration

The --overrides argument uses the EngineConfigOverride.from_str classmethod as its type converter. This allows users to pass engine configuration parameters as a semicolon-separated string. The parsed override fields that are forwarded to the calibration function include:

max_num_sequence -- Maximum number of sequences processed concurrently.
max_total_seq_length -- Maximum total sequence length across all sequences.
prefill_chunk_size -- Size of chunks for prefill operations.
max_history_size -- Maximum history window size.
gpu_memory_utilization -- Fraction of GPU memory available for inference.

Dataset Reference

The source code includes a comment noting the recommended calibration dataset:

# Download dataset from
# https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

This is the ShareGPT Vicuna dataset in JSON format, which provides multi-turn conversation data suitable for calibrating chat-oriented language models.

Design Notes

Help text is sourced from the shared HELP dictionary rather than being defined inline, ensuring consistent documentation across all CLI subcommands.
The module follows the standard MLC LLM CLI contract: it exposes a main(argv) function that receives the remaining argument list from the top-level dispatcher in __main__.py.
The thin CLI layer delegates all heavy logic to the mlc_llm.interface.calibrate module, maintaining separation between argument parsing and core functionality.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment