Principle:Mlc ai Mlc llm Model Library Packaging
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Mobile_Deployment |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Model library packaging is the process of compiling machine learning model computation graphs into platform-specific native libraries and assembling them with runtime bindings into deployable bundles for mobile applications.
Description
After a model's architecture and quantization scheme have been defined, the model computation graph must be compiled into executable code for the target hardware. On mobile platforms, this compilation produces platform-specific artifacts: static libraries (.a files) for iOS that are linked into the Xcode project, and shared libraries (.so files) for Android that are loaded via JNI at runtime.
The packaging process involves several stages:
1. Model Library Compilation. Each model in the configuration is JIT-compiled (just-in-time compiled) using TVM's compilation infrastructure. The compilation takes the model architecture, quantization parameters, and any overrides (e.g., prefill_chunk_size) and produces a platform-specific binary containing the optimized compute kernels. Each model library is identified by a unique system library prefix that allows multiple models to coexist in a single static library without symbol conflicts.
2. Library Validation. After compilation, the packaging system validates that every model referenced in the application configuration has a corresponding compiled library with the correct symbol prefix. This is done by inspecting the global symbol table of the combined static library and verifying that each model's ___tvm_ffi__library_bin symbol is present.
3. Platform Binding Assembly. The validated model libraries are combined with the platform-specific runtime bindings:
- On iOS, model libraries are linked with the MLC-LLM static library and TVM runtime into a single static archive, which is then referenced by the Xcode project.
- On Android, model libraries are linked with the
tvm4j_runtime_packedshared library, and the resulting JNI bindings and Gradle build files are assembled into themlc4jpackage structure.
4. Weight and Configuration Bundling. Model weights (for models with bundle_weight: true) are copied into the output bundle directory, and the runtime configuration file (mlc-app-config.json) is generated from the package configuration.
Usage
Use model library packaging when:
- Building a mobile application that includes one or more LLMs for on-device inference
- Creating a distributable package of compiled model libraries for a specific mobile platform
- Validating that compiled model libraries are complete and contain the expected symbol tables
- Automating the end-to-end pipeline from model configuration to deployable mobile artifacts
Theoretical Basis
The packaging pipeline follows a multi-stage process:
Input: mlc-package-config.json
mlc_llm_source_dir
output directory
Stage 1: Read Configuration
- Parse device target (iphone/android)
- Parse model list entries
Stage 2: For each model in model_list:
a. Download model artifacts (if HF:// source)
b. JIT compile model library for target device
- Input: model architecture + quantization + overrides
- Output: platform-specific compiled object (.tar)
c. If bundle_weight is true:
- Copy model weights to output/bundle/<model_id>/
Stage 3: Validate Model Libraries
- Combine all compiled objects into a single static library
- For each model, verify <model_lib>___tvm_ffi__library_bin
exists in the global symbol table
Stage 4: Build Platform Bindings
- iOS: run prepare_libs.sh, copy static libraries
- Android: run prepare_libs.py, copy JNI output + Gradle files
Stage 5: Generate mlc-app-config.json
- Map model_id -> model_lib, model_url/model_path
Output: output/lib/ (compiled libraries)
output/bundle/ (weights + mlc-app-config.json)
Key design decisions:
- JIT compilation allows the system to compile model libraries on demand rather than requiring pre-compiled artifacts. If a model library is not found in
model_lib_path_for_prepare_libs, it is compiled automatically. - Symbol-based validation ensures that each compiled model library exposes the expected TVM FFI entry point, catching build errors before the application is assembled.
- Static linking on iOS (versus shared libraries on Android) reflects Apple's restrictions on dynamically loaded code in iOS applications.