Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Mlc ai Mlc llm OpenCL Android Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Mobile, GPU_Acceleration
Last Updated 2026-02-09 19:00 GMT

Overview

Android deployment environment using OpenCL GPU backend with NDK cross-compilation for on-device LLM inference on Qualcomm Adreno and other mobile GPUs.

Description

This environment enables LLM inference on Android devices via the OpenCL GPU backend. Models are cross-compiled to `.tar` static archives (system library mode) or `.so` shared libraries using the Android NDK. The OpenCL runtime reports memory sizes smaller than actual available space, so MLC-LLM applies a minimum 5GB memory floor to support 7B/8B models. Adreno GPU variants have specialized target presets with `max_threads_per_block=512`.

Usage

Use this environment when deploying LLM models to Android devices. It is required for the Mobile Deployment workflow, including model packaging with `mlc_llm package`, building the Android app with `prepare_libs.py`, and bundling weights via ADB.

System Requirements

Category Requirement Notes
OS Android 10+ (API 29+) Host: Linux or macOS for cross-compilation
Hardware Mobile GPU with OpenCL support Qualcomm Adreno recommended; Mali supported
Toolchain Android NDK Required for cross-compilation via `TVM_NDK_CC`
VRAM 5GB+ effective GPU memory Runtime enforces 5GB minimum for OpenCL devices
Disk 2GB+ on device For model weights and compiled library

Dependencies

System Packages (Host)

  • Android NDK (set `TVM_NDK_CC` environment variable)
  • `cmake` < 4.0
  • `git`
  • `adb` (Android Debug Bridge, for device deployment)

Python Packages (Host)

  • `apache-tvm-ffi` (TVM FFI bindings)
  • `torch` (for weight conversion)
  • `transformers`
  • `safetensors`

Credentials

The following environment variables are used:

  • `TVM_NDK_CC`: Path to the Android NDK C++ compiler. Required for `.so` shared library builds and Mali targets.
  • `ANDROID_NDK`: Android NDK root path (used by `prepare_libs.py`).

Quick Install

# Install Python dependencies on host
pip install mlc-llm

# Set NDK compiler for cross-compilation
export TVM_NDK_CC=/path/to/android-ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android24-clang++

# Package model for Android
python -m mlc_llm package --device android

# Deploy weights to device via ADB
python -m mlc_llm bundle_weight --device android

Code Evidence

OpenCL memory floor for Android from `config.cc:33-41`:

// Since the memory size returned by the OpenCL runtime is smaller than the actual available
// memory space, we set a best available space so that MLC LLM can run 7B or 8B models on Android
// with OpenCL.
if (device.device_type == kDLOpenCL) {
    int64_t min_size_bytes = 5LL * 1024 * 1024 * 1024;  //  Minimum size is 5 GB
    gpu_size_bytes = std::max(gpu_size_bytes, min_size_bytes);
}

Android target presets from `auto_target.py:409-441`:

"android:generic": {
    "target": {
        "kind": "opencl",
        "host": {"kind": "llvm", "mtriple": "aarch64-linux-android"},
    },
    "build": _build_android,
},
"android:adreno": {
    "target": {
        "kind": "opencl",
        "device": "adreno",
        "max_threads_per_block": 512,
        "host": {"kind": "llvm", "mtriple": "aarch64-linux-android"},
    },
    "build": _build_android,
},

NDK-dependent Mali build from `auto_target.py:271-287`:

def _build_mali():
    def build(mod, args, pipeline=None):
        mod = relax.build(mod, target=args.target, relax_pipeline=pipeline, system_lib=True)
        if "TVM_NDK_CC" in os.environ:
            mod.export_library(str(output), fcompile=ndk.create_shared)
        else:
            mod.export_library(str(output))
    return build

Common Errors

Error Message Cause Solution
`Insufficient GPU memory` on Android OpenCL reports less memory than available System applies 5GB floor automatically; ensure device has >= 6GB RAM
Cross-compilation failure `TVM_NDK_CC` not set Export `TVM_NDK_CC` pointing to NDK clang++
ADB connection failed Device not authorized Run `adb devices` and accept USB debugging prompt on device

Compatibility Notes

  • OpenCL Memory Reporting: Android OpenCL runtime under-reports available GPU memory. MLC-LLM applies a 5GB minimum floor to work around this.
  • Adreno vs Generic: Use `android:adreno` target for Qualcomm Snapdragon devices (higher thread limits). Use `android:generic` for other devices.
  • Mali GPUs: Supported via OpenCL with NDK cross-compilation. Requires `TVM_NDK_CC` environment variable.
  • System Library Mode: Android builds use system library mode (`.tar`) by default. Use `android:adreno-so` for shared library (`.so`) builds.
  • FlashInfer/cuBLAS: Not available on Android. Only TIR-based kernels are used.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment