Implementation:Mlc ai Mlc llm CLI Lib Delivery
Overview
The file python/mlc_llm/cli/lib_delivery.py implements the continuous model library delivery pipeline for MLC LLM. It reads a JSON specification file describing a set of models, quantization options, and target devices, then compiles each combination into a shared library artifact. This is used for automated batch compilation and delivery of pre-built model libraries.
Location
- Repository: Mlc_ai_Mlc_llm
- File:
python/mlc_llm/cli/lib_delivery.py - Lines: 199
Key Components
ModelInfo Dataclass
@dataclasses.dataclass
class ModelInfo:
model_id: str
model: Path
quantization: str
device: str
overrides: Dict[str, int]
The ModelInfo dataclass encapsulates all the information needed to compile a single model variant:
| Field | Description |
|---|---|
model_id |
A unique identifier for the model (e.g., "Llama-2-7b-chat-hf").
|
model |
The filesystem path to the model. |
quantization |
The quantization scheme to apply (e.g., "q4f16_1").
|
device |
The target device for compilation (e.g., "cuda", "metal", "webgpu").
|
overrides |
A dictionary of configuration overrides such as context_window_size, prefill_chunk_size, sliding_window_size, attention_sink_size, max_batch_size, and tensor_parallel_shards.
|
DeferredScope Context Manager
class DeferredScope:
def __init__(self):
self.deferred_functions = []
def add(self, func: Callable[[], None]):
self.deferred_functions.append(func)
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
for func in reversed(self.deferred_functions):
func()
return False
def create_temp_dir(self) -> Path:
temp_dir = tempfile.mkdtemp(dir=MLC_TEMP_DIR)
self.add(lambda: shutil.rmtree(temp_dir, ignore_errors=True))
return Path(temp_dir)
DeferredScope is a utility context manager that accumulates cleanup functions and executes them in reverse order upon scope exit. It provides a create_temp_dir() convenience method that creates a temporary directory and automatically registers its deletion. This follows the Go-style defer pattern.
_run_compilation Function
The _run_compilation function handles the compilation of a single model library:
def _run_compilation(model_info: ModelInfo, repo_dir: Path) -> bool:
Process:
- Determines the library extension based on the target device:
cuda,vulkan,metalproduce.sofiles.android,iosproduce.tararchives.webgpuproduces.wasmfiles.
- Constructs a compilation command that invokes
python -m mlc_llm compileas a subprocess with the model path, device, quantization, overrides, and output path. - Executes the command in a temporary directory, capturing stdout and stderr to a log file.
- Copies the compiled library from the temporary directory to the target repository directory under
repo_dir / model_id / model_lib_name.
The function returns True on success and False if the compiled library file is not found after the subprocess completes.
The library filename follows the convention: {model_id}-{quantization}-{device}{extension}.
_main Function
The _main function orchestrates the full batch compilation:
def _main(spec: Dict[str, Any]):
Process:
- Iterates over each task in the specification's
"tasks"list. - For each task, it creates the Cartesian product of:
- Compile options: The union of
spec["default_compile_options"]and task-specificcompile_options. - Quantizations: The union of
spec["default_quantization"]and task-specificquantization.
- Compile options: The union of
- Calls
_run_compilationfor each combination. - Tracks and reports all failed cases at the end with their model ID, quantization, device, and overrides.
Spec File Format
The JSON specification file is expected to contain:
| Key | Description |
|---|---|
tasks |
A list of task objects, each with model_id, model, and optional compile_options and quantization overrides.
|
default_compile_options |
A list of default compile option objects, each with a device field and optional overrides.
|
default_quantization |
A list of default quantization strings applied to all tasks. |
binary_libs_dir |
The output directory where compiled libraries are stored. |
CLI Entry Point
def main():
parser = ArgumentParser("MLC LLM continuous library delivery")
parser.add_argument(
"--spec",
type=_load_spec,
required=True,
help="Path to the spec file",
)
parsed = parser.parse_args()
_main(spec=parsed.spec)
The main() function accepts a single required argument --spec pointing to the JSON specification file. The _load_spec helper validates the file exists and parses it from JSON.
Design Notes
- The module uses subprocess invocation to compile each model, calling
python -m mlc_llm compile. This isolates each compilation in its own process, preventing memory leaks or state corruption between compilations. - Temporary directories are created under
MLC_TEMP_DIR(frommlc_llm.support.constants) rather than the system default, allowing centralized control of temporary file locations. - Styled logging output uses the
bold,green, andredformatting functions frommlc_llm.support.stylefor clear visual feedback during batch processing. - The file also supports direct execution via the
if __name__ == "__main__"guard.