Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlc ai Mlc llm Jit

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Model_Serving, Compiler_Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for just-in-time compilation of model libraries provided by MLC-LLM.

Description

The jit function compiles an MLC-LLM model into a platform-specific shared library on demand at runtime. It reads the model's mlc-chat-config.json to extract the model type and quantization scheme, computes a deterministic MD5 hash over the full compilation configuration (model config, overrides, optimization flags, target device), and checks whether a cached compiled artifact already exists under MLC_LLM_HOME/model_lib/. On a cache hit, it returns the cached library path immediately. On a cache miss, it invokes mlc_llm compile as a subprocess, producing a shared object (.so, .dll, or .dylib) or a .tar archive for mobile targets, then atomically moves the result into the cache directory for future reuse.

The function respects the MLC_JIT_POLICY environment variable, which can be set to ON (default, compile and cache), OFF (disable JIT entirely), REDO (always recompile ignoring cache), or READONLY (only use cached artifacts, fail if not found). For mobile targets (iPhone, Android), the function also manages a system_lib_prefix to uniquely namespace the compiled system library.

Usage

Use this function when no pre-compiled model library is available and you want MLC-LLM to transparently compile the model on first use. This is the default behavior when model_lib is not explicitly provided to the engine constructor. It is also useful during development to quickly iterate on model configuration changes without a separate build step.

Code Reference

Source Location

  • Repository: MLC-LLM
  • File: python/mlc_llm/interface/jit.py (Lines 50-181)

Signature

def jit(
    model_path: Path,
    overrides: Dict[str, Any],
    device: Union[Device, str],
    system_lib_prefix: Optional[str] = None,
    *,
    skip_log_jit_policy: bool = False,
) -> JITResult:
    """Just-in-time compile a MLC-Chat model."""

Return Type

@dataclasses.dataclass
class JITResult:
    """The jit compilation result class."""
    model_lib_path: str
    system_lib_prefix: Optional[str] = None

Import

from mlc_llm.interface.jit import jit, JITResult

I/O Contract

Inputs

Name Type Required Description
model_path pathlib.Path Yes Path to the model directory containing mlc-chat-config.json.
overrides Dict[str, Any] Yes Dictionary of model configuration overrides (e.g., context_window_size, prefill_chunk_size, tensor_parallel_shards, opt). The opt key, if present, specifies optimization flags (defaults to "O2").
device Union[tvm.runtime.Device, str] Yes Target device for compilation, such as "cuda", "metal", "iphone", or "android".
system_lib_prefix Optional[str] No Optional prefix for the system library name. Auto-generated for mobile targets if not provided. Defaults to None.
skip_log_jit_policy bool No Keyword-only. If True, suppresses logging of the current JIT policy. Defaults to False.

Outputs

Name Type Description
result JITResult A dataclass containing model_lib_path (str, the path to the compiled shared library) and system_lib_prefix (Optional[str], the system library prefix used for mobile targets).

Usage Examples

Basic Usage

from pathlib import Path
from mlc_llm.interface.jit import jit

# Compile a model for CUDA with default optimization
result = jit(
    model_path=Path("dist/models/Llama-2-7b-chat-hf-q4f16_1"),
    overrides={},
    device="cuda",
)
print(f"Compiled library at: {result.model_lib_path}")

With Configuration Overrides

from pathlib import Path
from mlc_llm.interface.jit import jit

# Compile with custom context window size and tensor parallelism
result = jit(
    model_path=Path("dist/models/Llama-2-7b-chat-hf-q4f16_1"),
    overrides={
        "context_window_size": 4096,
        "tensor_parallel_shards": 2,
        "opt": "O2",
    },
    device="cuda",
)
print(f"Compiled library at: {result.model_lib_path}")

Controlling JIT Policy via Environment Variable

import os
from pathlib import Path
from mlc_llm.interface.jit import jit

# Only use cached libraries, never compile
os.environ["MLC_JIT_POLICY"] = "READONLY"

try:
    result = jit(
        model_path=Path("dist/models/Llama-2-7b-chat-hf-q4f16_1"),
        overrides={},
        device="cuda",
    )
except RuntimeError as e:
    print(f"No cached library found: {e}")

Related Pages

Implements Principle

Environment and Heuristic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment