Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Mlc ai Mlc llm Model Library Packaging

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Mobile_Deployment
Last Updated 2026-02-09 00:00 GMT

Overview

Model library packaging is the process of compiling machine learning model computation graphs into platform-specific native libraries and assembling them with runtime bindings into deployable bundles for mobile applications.

Description

After a model's architecture and quantization scheme have been defined, the model computation graph must be compiled into executable code for the target hardware. On mobile platforms, this compilation produces platform-specific artifacts: static libraries (.a files) for iOS that are linked into the Xcode project, and shared libraries (.so files) for Android that are loaded via JNI at runtime.

The packaging process involves several stages:

1. Model Library Compilation. Each model in the configuration is JIT-compiled (just-in-time compiled) using TVM's compilation infrastructure. The compilation takes the model architecture, quantization parameters, and any overrides (e.g., prefill_chunk_size) and produces a platform-specific binary containing the optimized compute kernels. Each model library is identified by a unique system library prefix that allows multiple models to coexist in a single static library without symbol conflicts.

2. Library Validation. After compilation, the packaging system validates that every model referenced in the application configuration has a corresponding compiled library with the correct symbol prefix. This is done by inspecting the global symbol table of the combined static library and verifying that each model's ___tvm_ffi__library_bin symbol is present.

3. Platform Binding Assembly. The validated model libraries are combined with the platform-specific runtime bindings:

  • On iOS, model libraries are linked with the MLC-LLM static library and TVM runtime into a single static archive, which is then referenced by the Xcode project.
  • On Android, model libraries are linked with the tvm4j_runtime_packed shared library, and the resulting JNI bindings and Gradle build files are assembled into the mlc4j package structure.

4. Weight and Configuration Bundling. Model weights (for models with bundle_weight: true) are copied into the output bundle directory, and the runtime configuration file (mlc-app-config.json) is generated from the package configuration.

Usage

Use model library packaging when:

  • Building a mobile application that includes one or more LLMs for on-device inference
  • Creating a distributable package of compiled model libraries for a specific mobile platform
  • Validating that compiled model libraries are complete and contain the expected symbol tables
  • Automating the end-to-end pipeline from model configuration to deployable mobile artifacts

Theoretical Basis

The packaging pipeline follows a multi-stage process:

Input: mlc-package-config.json
       mlc_llm_source_dir
       output directory

Stage 1: Read Configuration
  - Parse device target (iphone/android)
  - Parse model list entries

Stage 2: For each model in model_list:
  a. Download model artifacts (if HF:// source)
  b. JIT compile model library for target device
     - Input: model architecture + quantization + overrides
     - Output: platform-specific compiled object (.tar)
  c. If bundle_weight is true:
     - Copy model weights to output/bundle/<model_id>/

Stage 3: Validate Model Libraries
  - Combine all compiled objects into a single static library
  - For each model, verify <model_lib>___tvm_ffi__library_bin
    exists in the global symbol table

Stage 4: Build Platform Bindings
  - iOS: run prepare_libs.sh, copy static libraries
  - Android: run prepare_libs.py, copy JNI output + Gradle files

Stage 5: Generate mlc-app-config.json
  - Map model_id -> model_lib, model_url/model_path

Output: output/lib/       (compiled libraries)
        output/bundle/    (weights + mlc-app-config.json)

Key design decisions:

  • JIT compilation allows the system to compile model libraries on demand rather than requiring pre-compiled artifacts. If a model library is not found in model_lib_path_for_prepare_libs, it is compiled automatically.
  • Symbol-based validation ensures that each compiled model library exposes the expected TVM FFI entry point, catching build errors before the application is assembled.
  • Static linking on iOS (versus shared libraries on Android) reflects Apple's restrictions on dynamically loaded code in iOS applications.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment