Implementation:Mlc ai Mlc llm CLI Lib Delivery

Overview

The file python/mlc_llm/cli/lib_delivery.py implements the continuous model library delivery pipeline for MLC LLM. It reads a JSON specification file describing a set of models, quantization options, and target devices, then compiles each combination into a shared library artifact. This is used for automated batch compilation and delivery of pre-built model libraries.

Location

Repository: Mlc_ai_Mlc_llm
File: python/mlc_llm/cli/lib_delivery.py
Lines: 199

Key Components

ModelInfo Dataclass

@dataclasses.dataclass
class ModelInfo:
    model_id: str
    model: Path
    quantization: str
    device: str
    overrides: Dict[str, int]

The ModelInfo dataclass encapsulates all the information needed to compile a single model variant:

Field	Description
`model_id`	A unique identifier for the model (e.g., `"Llama-2-7b-chat-hf"`).
`model`	The filesystem path to the model.
`quantization`	The quantization scheme to apply (e.g., `"q4f16_1"`).
`device`	The target device for compilation (e.g., `"cuda"`, `"metal"`, `"webgpu"`).
`overrides`	A dictionary of configuration overrides such as `context_window_size`, `prefill_chunk_size`, `sliding_window_size`, `attention_sink_size`, `max_batch_size`, and `tensor_parallel_shards`.

DeferredScope Context Manager

class DeferredScope:
    def __init__(self):
        self.deferred_functions = []

    def add(self, func: Callable[[], None]):
        self.deferred_functions.append(func)

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        for func in reversed(self.deferred_functions):
            func()
        return False

    def create_temp_dir(self) -> Path:
        temp_dir = tempfile.mkdtemp(dir=MLC_TEMP_DIR)
        self.add(lambda: shutil.rmtree(temp_dir, ignore_errors=True))
        return Path(temp_dir)

DeferredScope is a utility context manager that accumulates cleanup functions and executes them in reverse order upon scope exit. It provides a create_temp_dir() convenience method that creates a temporary directory and automatically registers its deletion. This follows the Go-style defer pattern.

_run_compilation Function

The _run_compilation function handles the compilation of a single model library:

def _run_compilation(model_info: ModelInfo, repo_dir: Path) -> bool:

Process:

Determines the library extension based on the target device:
- cuda, vulkan, metal produce .so files.
- android, ios produce .tar archives.
- webgpu produces .wasm files.
Constructs a compilation command that invokes python -m mlc_llm compile as a subprocess with the model path, device, quantization, overrides, and output path.
Executes the command in a temporary directory, capturing stdout and stderr to a log file.
Copies the compiled library from the temporary directory to the target repository directory under repo_dir / model_id / model_lib_name.

The function returns True on success and False if the compiled library file is not found after the subprocess completes.

The library filename follows the convention: {model_id}-{quantization}-{device}{extension}.

_main Function

The _main function orchestrates the full batch compilation:

def _main(spec: Dict[str, Any]):

Process:

Iterates over each task in the specification's "tasks" list.
For each task, it creates the Cartesian product of:
- Compile options: The union of spec["default_compile_options"] and task-specific compile_options.
- Quantizations: The union of spec["default_quantization"] and task-specific quantization.
Calls _run_compilation for each combination.
Tracks and reports all failed cases at the end with their model ID, quantization, device, and overrides.

Spec File Format

The JSON specification file is expected to contain:

Key	Description
`tasks`	A list of task objects, each with `model_id`, `model`, and optional `compile_options` and `quantization` overrides.
`default_compile_options`	A list of default compile option objects, each with a `device` field and optional `overrides`.
`default_quantization`	A list of default quantization strings applied to all tasks.
`binary_libs_dir`	The output directory where compiled libraries are stored.

CLI Entry Point

def main():
    parser = ArgumentParser("MLC LLM continuous library delivery")
    parser.add_argument(
        "--spec",
        type=_load_spec,
        required=True,
        help="Path to the spec file",
    )
    parsed = parser.parse_args()
    _main(spec=parsed.spec)

The main() function accepts a single required argument --spec pointing to the JSON specification file. The _load_spec helper validates the file exists and parses it from JSON.

Design Notes

The module uses subprocess invocation to compile each model, calling python -m mlc_llm compile. This isolates each compilation in its own process, preventing memory leaks or state corruption between compilations.
Temporary directories are created under MLC_TEMP_DIR (from mlc_llm.support.constants) rather than the system default, allowing centralized control of temporary file locations.
Styled logging output uses the bold, green, and red formatting functions from mlc_llm.support.style for clear visual feedback during batch processing.
The file also supports direct execution via the if __name__ == "__main__" guard.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment