Implementation:Mlc ai Mlc llm Jit
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Serving, Compiler_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for just-in-time compilation of model libraries provided by MLC-LLM.
Description
The jit function compiles an MLC-LLM model into a platform-specific shared library on demand at runtime. It reads the model's mlc-chat-config.json to extract the model type and quantization scheme, computes a deterministic MD5 hash over the full compilation configuration (model config, overrides, optimization flags, target device), and checks whether a cached compiled artifact already exists under MLC_LLM_HOME/model_lib/. On a cache hit, it returns the cached library path immediately. On a cache miss, it invokes mlc_llm compile as a subprocess, producing a shared object (.so, .dll, or .dylib) or a .tar archive for mobile targets, then atomically moves the result into the cache directory for future reuse.
The function respects the MLC_JIT_POLICY environment variable, which can be set to ON (default, compile and cache), OFF (disable JIT entirely), REDO (always recompile ignoring cache), or READONLY (only use cached artifacts, fail if not found). For mobile targets (iPhone, Android), the function also manages a system_lib_prefix to uniquely namespace the compiled system library.
Usage
Use this function when no pre-compiled model library is available and you want MLC-LLM to transparently compile the model on first use. This is the default behavior when model_lib is not explicitly provided to the engine constructor. It is also useful during development to quickly iterate on model configuration changes without a separate build step.
Code Reference
Source Location
- Repository: MLC-LLM
- File:
python/mlc_llm/interface/jit.py(Lines 50-181)
Signature
def jit(
model_path: Path,
overrides: Dict[str, Any],
device: Union[Device, str],
system_lib_prefix: Optional[str] = None,
*,
skip_log_jit_policy: bool = False,
) -> JITResult:
"""Just-in-time compile a MLC-Chat model."""
Return Type
@dataclasses.dataclass
class JITResult:
"""The jit compilation result class."""
model_lib_path: str
system_lib_prefix: Optional[str] = None
Import
from mlc_llm.interface.jit import jit, JITResult
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | pathlib.Path |
Yes | Path to the model directory containing mlc-chat-config.json.
|
| overrides | Dict[str, Any] |
Yes | Dictionary of model configuration overrides (e.g., context_window_size, prefill_chunk_size, tensor_parallel_shards, opt). The opt key, if present, specifies optimization flags (defaults to "O2").
|
| device | Union[tvm.runtime.Device, str] |
Yes | Target device for compilation, such as "cuda", "metal", "iphone", or "android".
|
| system_lib_prefix | Optional[str] |
No | Optional prefix for the system library name. Auto-generated for mobile targets if not provided. Defaults to None.
|
| skip_log_jit_policy | bool |
No | Keyword-only. If True, suppresses logging of the current JIT policy. Defaults to False.
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | JITResult |
A dataclass containing model_lib_path (str, the path to the compiled shared library) and system_lib_prefix (Optional[str], the system library prefix used for mobile targets).
|
Usage Examples
Basic Usage
from pathlib import Path
from mlc_llm.interface.jit import jit
# Compile a model for CUDA with default optimization
result = jit(
model_path=Path("dist/models/Llama-2-7b-chat-hf-q4f16_1"),
overrides={},
device="cuda",
)
print(f"Compiled library at: {result.model_lib_path}")
With Configuration Overrides
from pathlib import Path
from mlc_llm.interface.jit import jit
# Compile with custom context window size and tensor parallelism
result = jit(
model_path=Path("dist/models/Llama-2-7b-chat-hf-q4f16_1"),
overrides={
"context_window_size": 4096,
"tensor_parallel_shards": 2,
"opt": "O2",
},
device="cuda",
)
print(f"Compiled library at: {result.model_lib_path}")
Controlling JIT Policy via Environment Variable
import os
from pathlib import Path
from mlc_llm.interface.jit import jit
# Only use cached libraries, never compile
os.environ["MLC_JIT_POLICY"] = "READONLY"
try:
result = jit(
model_path=Path("dist/models/Llama-2-7b-chat-hf-q4f16_1"),
overrides={},
device="cuda",
)
except RuntimeError as e:
print(f"No cached library found: {e}")