Implementation:Mlc ai Mlc llm CLI Main

Overview

The file python/mlc_llm/__main__.py is the top-level CLI entrypoint for MLC LLM. When the package is invoked via python -m mlc_llm, this module parses the first positional argument to determine which subcommand to run, then lazily imports and delegates to the appropriate CLI module.

Location

Repository: Mlc_ai_Mlc_llm
File: python/mlc_llm/__main__.py
Lines: 69

Supported Subcommands

The CLI supports the following subcommands, each mapping to a dedicated module under mlc_llm.cli:

Subcommand	CLI Module	Description
`compile`	`mlc_llm.cli.compile`	Compiles a model into a TVM-based shared library.
`convert_weight`	`mlc_llm.cli.convert_weight`	Converts model weights to the MLC format.
`gen_config`	`mlc_llm.cli.gen_config`	Generates the MLC chat configuration JSON file.
`chat`	`mlc_llm.cli.chat`	Runs an interactive chat session.
`serve`	`mlc_llm.cli.serve`	Starts an OpenAI-compatible serving endpoint.
`package`	`mlc_llm.cli.package`	Packages models for deployment.
`calibrate`	`mlc_llm.cli.calibrate`	Runs calibration for quantization.
`router`	`mlc_llm.cli.router`	Starts a router for multi-model serving.

Implementation Details

Logging Initialization

The module begins by enabling MLC LLM's logging subsystem before any CLI processing occurs:

from mlc_llm.support import logging
from mlc_llm.support.argparse import ArgumentParser

logging.enable_logging()

Argument Parsing and Dispatch

The main() function uses a two-stage argument parsing approach:

def main():
    """Entrypoint of all CLI commands from MLC LLM"""
    parser = ArgumentParser("MLC LLM Command Line Interface.")
    parser.add_argument(
        "subcommand",
        type=str,
        choices=[
            "compile",
            "convert_weight",
            "gen_config",
            "chat",
            "serve",
            "package",
            "calibrate",
            "router",
        ],
        help="Subcommand to to run. (choices: %(choices)s)",
    )
    parsed = parser.parse_args(sys.argv[1:2])

Stage 1: Only the first argument (sys.argv[1:2]) is parsed to determine the subcommand. This ensures that subcommand-specific arguments do not interfere with the top-level parser.

Stage 2: The selected subcommand's module is lazily imported and its main() function is called with the remaining arguments (sys.argv[2:]):

    if parsed.subcommand == "compile":
        from mlc_llm.cli import compile as cli
        cli.main(sys.argv[2:])
    elif parsed.subcommand == "convert_weight":
        from mlc_llm.cli import convert_weight as cli
        cli.main(sys.argv[2:])
    # ... additional subcommands follow the same pattern
    else:
        raise ValueError(f"Unknown subcommand {parsed.subcommand}")

Lazy Import Pattern

All subcommand modules are imported inside the dispatch branches (marked with # pylint: disable=import-outside-toplevel). This lazy import strategy ensures that:

Only the dependencies required for the selected subcommand are loaded.
Startup time remains fast, as heavy dependencies (e.g., TVM, model loaders) are not imported unless needed.
Each subcommand module is responsible for parsing its own arguments and executing its logic.

Module Execution

The standard Python module execution guard at the end enables direct invocation:

if __name__ == "__main__":
    main()

Design Notes

The file uses MLC LLM's custom ArgumentParser from mlc_llm.support.argparse rather than the standard library argparse.ArgumentParser, providing consistent argument parsing behavior across the project.
Each subcommand follows a uniform contract: the CLI module must expose a main(argv) function that accepts a list of argument strings.
The two-stage parsing approach avoids the complexity of subparsers while still providing clean help messages and error reporting for invalid subcommand names.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment