Implementation:Mlc ai Mlc llm CLI Calibrate
Overview
The file python/mlc_llm/cli/calibrate.py implements the command-line interface for the calibration subcommand of MLC LLM. Calibration is the process of collecting activation statistics from a model by running it on a representative dataset, which is used to inform quantization decisions. This module parses CLI arguments and delegates to the core calibrate() function in mlc_llm.interface.calibrate.
Location
- Repository: Mlc_ai_Mlc_llm
- File:
python/mlc_llm/cli/calibrate.py - Lines: 80
CLI Arguments
The module defines the following command-line arguments:
| Argument | Type | Default | Required | Description |
|---|---|---|---|---|
model (positional) |
str |
-- | Yes | The model to calibrate. |
--device |
str |
"auto" |
No | The device to deploy the model on. |
--model-lib |
str |
None |
No | Path to the compiled model library. |
--output / -o |
str |
-- | Yes | Output path for calibration data. |
--dataset |
str |
-- | Yes | Path to the calibration dataset (e.g., ShareGPT format). |
--num-calibration-samples |
int |
16 |
No | Number of samples to use for calibration. |
--seed |
int |
0 |
No | Random seed for reproducible sample selection. |
--overrides |
EngineConfigOverride |
"" |
No | Engine configuration overrides for serving parameters. |
Implementation Details
Imports and Dependencies
from mlc_llm.interface.calibrate import calibrate
from mlc_llm.interface.help import HELP
from mlc_llm.support.argparse import ArgumentParser
from .serve import EngineConfigOverride
The module imports:
calibratefrommlc_llm.interface.calibrate-- the core calibration logic.HELPfrommlc_llm.interface.help-- a shared dictionary of help text strings for consistent CLI documentation.EngineConfigOverridefrommlc_llm.cli.serve-- a dataclass that parses engine configuration override strings (e.g., memory utilization, chunk sizes).
Main Function
The main(argv) function constructs the argument parser, parses the arguments, and calls the core calibrate() function:
def main(argv):
"""Main entrypoint for calibration."""
parser = ArgumentParser("MLC LLM Calibration CLI")
# ... argument definitions ...
parsed = parser.parse_args(argv)
calibrate(
model=parsed.model,
device=parsed.device,
model_lib=parsed.model_lib,
output=parsed.output,
dataset=parsed.dataset,
num_calibration_samples=parsed.num_calibration_samples,
max_num_sequence=parsed.overrides.max_num_sequence,
max_total_sequence_length=parsed.overrides.max_total_seq_length,
prefill_chunk_size=parsed.overrides.prefill_chunk_size,
max_history_size=parsed.overrides.max_history_size,
gpu_memory_utilization=parsed.overrides.gpu_memory_utilization,
seed=parsed.seed,
)
EngineConfigOverride Integration
The --overrides argument uses the EngineConfigOverride.from_str classmethod as its type converter. This allows users to pass engine configuration parameters as a semicolon-separated string. The parsed override fields that are forwarded to the calibration function include:
max_num_sequence-- Maximum number of sequences processed concurrently.max_total_seq_length-- Maximum total sequence length across all sequences.prefill_chunk_size-- Size of chunks for prefill operations.max_history_size-- Maximum history window size.gpu_memory_utilization-- Fraction of GPU memory available for inference.
Dataset Reference
The source code includes a comment noting the recommended calibration dataset:
# Download dataset from
# https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
This is the ShareGPT Vicuna dataset in JSON format, which provides multi-turn conversation data suitable for calibrating chat-oriented language models.
Design Notes
- Help text is sourced from the shared
HELPdictionary rather than being defined inline, ensuring consistent documentation across all CLI subcommands. - The module follows the standard MLC LLM CLI contract: it exposes a
main(argv)function that receives the remaining argument list from the top-level dispatcher in__main__.py. - The thin CLI layer delegates all heavy logic to the
mlc_llm.interface.calibratemodule, maintaining separation between argument parsing and core functionality.