Implementation:Mlc ai Mlc llm CLI Main
Overview
The file python/mlc_llm/__main__.py is the top-level CLI entrypoint for MLC LLM. When the package is invoked via python -m mlc_llm, this module parses the first positional argument to determine which subcommand to run, then lazily imports and delegates to the appropriate CLI module.
Location
- Repository: Mlc_ai_Mlc_llm
- File:
python/mlc_llm/__main__.py - Lines: 69
Supported Subcommands
The CLI supports the following subcommands, each mapping to a dedicated module under mlc_llm.cli:
| Subcommand | CLI Module | Description |
|---|---|---|
compile |
mlc_llm.cli.compile |
Compiles a model into a TVM-based shared library. |
convert_weight |
mlc_llm.cli.convert_weight |
Converts model weights to the MLC format. |
gen_config |
mlc_llm.cli.gen_config |
Generates the MLC chat configuration JSON file. |
chat |
mlc_llm.cli.chat |
Runs an interactive chat session. |
serve |
mlc_llm.cli.serve |
Starts an OpenAI-compatible serving endpoint. |
package |
mlc_llm.cli.package |
Packages models for deployment. |
calibrate |
mlc_llm.cli.calibrate |
Runs calibration for quantization. |
router |
mlc_llm.cli.router |
Starts a router for multi-model serving. |
Implementation Details
Logging Initialization
The module begins by enabling MLC LLM's logging subsystem before any CLI processing occurs:
from mlc_llm.support import logging
from mlc_llm.support.argparse import ArgumentParser
logging.enable_logging()
Argument Parsing and Dispatch
The main() function uses a two-stage argument parsing approach:
def main():
"""Entrypoint of all CLI commands from MLC LLM"""
parser = ArgumentParser("MLC LLM Command Line Interface.")
parser.add_argument(
"subcommand",
type=str,
choices=[
"compile",
"convert_weight",
"gen_config",
"chat",
"serve",
"package",
"calibrate",
"router",
],
help="Subcommand to to run. (choices: %(choices)s)",
)
parsed = parser.parse_args(sys.argv[1:2])
Stage 1: Only the first argument (sys.argv[1:2]) is parsed to determine the subcommand. This ensures that subcommand-specific arguments do not interfere with the top-level parser.
Stage 2: The selected subcommand's module is lazily imported and its main() function is called with the remaining arguments (sys.argv[2:]):
if parsed.subcommand == "compile":
from mlc_llm.cli import compile as cli
cli.main(sys.argv[2:])
elif parsed.subcommand == "convert_weight":
from mlc_llm.cli import convert_weight as cli
cli.main(sys.argv[2:])
# ... additional subcommands follow the same pattern
else:
raise ValueError(f"Unknown subcommand {parsed.subcommand}")
Lazy Import Pattern
All subcommand modules are imported inside the dispatch branches (marked with # pylint: disable=import-outside-toplevel). This lazy import strategy ensures that:
- Only the dependencies required for the selected subcommand are loaded.
- Startup time remains fast, as heavy dependencies (e.g., TVM, model loaders) are not imported unless needed.
- Each subcommand module is responsible for parsing its own arguments and executing its logic.
Module Execution
The standard Python module execution guard at the end enables direct invocation:
if __name__ == "__main__":
main()
Design Notes
- The file uses MLC LLM's custom
ArgumentParserfrommlc_llm.support.argparserather than the standard libraryargparse.ArgumentParser, providing consistent argument parsing behavior across the project. - Each subcommand follows a uniform contract: the CLI module must expose a
main(argv)function that accepts a list of argument strings. - The two-stage parsing approach avoids the complexity of subparsers while still providing clean help messages and error reporting for invalid subcommand names.