Environment:Mlc ai Mlc llm OpenCL Android Environment

Knowledge Sources	MLC-LLM Android NDK
Domains	Infrastructure, Mobile, GPU_Acceleration
Last Updated	2026-02-09 19:00 GMT

Overview

Android deployment environment using OpenCL GPU backend with NDK cross-compilation for on-device LLM inference on Qualcomm Adreno and other mobile GPUs.

Description

This environment enables LLM inference on Android devices via the OpenCL GPU backend. Models are cross-compiled to `.tar` static archives (system library mode) or `.so` shared libraries using the Android NDK. The OpenCL runtime reports memory sizes smaller than actual available space, so MLC-LLM applies a minimum 5GB memory floor to support 7B/8B models. Adreno GPU variants have specialized target presets with `max_threads_per_block=512`.

Usage

Use this environment when deploying LLM models to Android devices. It is required for the Mobile Deployment workflow, including model packaging with `mlc_llm package`, building the Android app with `prepare_libs.py`, and bundling weights via ADB.

System Requirements

Category	Requirement	Notes
OS	Android 10+ (API 29+)	Host: Linux or macOS for cross-compilation
Hardware	Mobile GPU with OpenCL support	Qualcomm Adreno recommended; Mali supported
Toolchain	Android NDK	Required for cross-compilation via `TVM_NDK_CC`
VRAM	5GB+ effective GPU memory	Runtime enforces 5GB minimum for OpenCL devices
Disk	2GB+ on device	For model weights and compiled library

Dependencies

System Packages (Host)

Android NDK (set `TVM_NDK_CC` environment variable)
`cmake` < 4.0
`git`
`adb` (Android Debug Bridge, for device deployment)

Python Packages (Host)

`apache-tvm-ffi` (TVM FFI bindings)
`torch` (for weight conversion)
`transformers`
`safetensors`

Credentials

The following environment variables are used:

`TVM_NDK_CC`: Path to the Android NDK C++ compiler. Required for `.so` shared library builds and Mali targets.
`ANDROID_NDK`: Android NDK root path (used by `prepare_libs.py`).

Quick Install

# Install Python dependencies on host
pip install mlc-llm

# Set NDK compiler for cross-compilation
export TVM_NDK_CC=/path/to/android-ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android24-clang++

# Package model for Android
python -m mlc_llm package --device android

# Deploy weights to device via ADB
python -m mlc_llm bundle_weight --device android

Code Evidence

OpenCL memory floor for Android from `config.cc:33-41`:

// Since the memory size returned by the OpenCL runtime is smaller than the actual available
// memory space, we set a best available space so that MLC LLM can run 7B or 8B models on Android
// with OpenCL.
if (device.device_type == kDLOpenCL) {
    int64_t min_size_bytes = 5LL * 1024 * 1024 * 1024;  //  Minimum size is 5 GB
    gpu_size_bytes = std::max(gpu_size_bytes, min_size_bytes);
}

Android target presets from `auto_target.py:409-441`:

"android:generic": {
    "target": {
        "kind": "opencl",
        "host": {"kind": "llvm", "mtriple": "aarch64-linux-android"},
    },
    "build": _build_android,
},
"android:adreno": {
    "target": {
        "kind": "opencl",
        "device": "adreno",
        "max_threads_per_block": 512,
        "host": {"kind": "llvm", "mtriple": "aarch64-linux-android"},
    },
    "build": _build_android,
},

NDK-dependent Mali build from `auto_target.py:271-287`:

def _build_mali():
    def build(mod, args, pipeline=None):
        mod = relax.build(mod, target=args.target, relax_pipeline=pipeline, system_lib=True)
        if "TVM_NDK_CC" in os.environ:
            mod.export_library(str(output), fcompile=ndk.create_shared)
        else:
            mod.export_library(str(output))
    return build

Common Errors

Error Message	Cause	Solution
`Insufficient GPU memory` on Android	OpenCL reports less memory than available	System applies 5GB floor automatically; ensure device has >= 6GB RAM
Cross-compilation failure	`TVM_NDK_CC` not set	Export `TVM_NDK_CC` pointing to NDK clang++
ADB connection failed	Device not authorized	Run `adb devices` and accept USB debugging prompt on device

Compatibility Notes

OpenCL Memory Reporting: Android OpenCL runtime under-reports available GPU memory. MLC-LLM applies a 5GB minimum floor to work around this.
Adreno vs Generic: Use `android:adreno` target for Qualcomm Snapdragon devices (higher thread limits). Use `android:generic` for other devices.
Mali GPUs: Supported via OpenCL with NDK cross-compilation. Requires `TVM_NDK_CC` environment variable.
System Library Mode: Android builds use system library mode (`.tar`) by default. Use `android:adreno-so` for shared library (`.so`) builds.
FlashInfer/cuBLAS: Not available on Android. Only TIR-based kernels are used.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment