Environment:Mlc ai Mlc llm OpenCL Android Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Mobile, GPU_Acceleration |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Android deployment environment using OpenCL GPU backend with NDK cross-compilation for on-device LLM inference on Qualcomm Adreno and other mobile GPUs.
Description
This environment enables LLM inference on Android devices via the OpenCL GPU backend. Models are cross-compiled to `.tar` static archives (system library mode) or `.so` shared libraries using the Android NDK. The OpenCL runtime reports memory sizes smaller than actual available space, so MLC-LLM applies a minimum 5GB memory floor to support 7B/8B models. Adreno GPU variants have specialized target presets with `max_threads_per_block=512`.
Usage
Use this environment when deploying LLM models to Android devices. It is required for the Mobile Deployment workflow, including model packaging with `mlc_llm package`, building the Android app with `prepare_libs.py`, and bundling weights via ADB.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Android 10+ (API 29+) | Host: Linux or macOS for cross-compilation |
| Hardware | Mobile GPU with OpenCL support | Qualcomm Adreno recommended; Mali supported |
| Toolchain | Android NDK | Required for cross-compilation via `TVM_NDK_CC` |
| VRAM | 5GB+ effective GPU memory | Runtime enforces 5GB minimum for OpenCL devices |
| Disk | 2GB+ on device | For model weights and compiled library |
Dependencies
System Packages (Host)
- Android NDK (set `TVM_NDK_CC` environment variable)
- `cmake` < 4.0
- `git`
- `adb` (Android Debug Bridge, for device deployment)
Python Packages (Host)
- `apache-tvm-ffi` (TVM FFI bindings)
- `torch` (for weight conversion)
- `transformers`
- `safetensors`
Credentials
The following environment variables are used:
- `TVM_NDK_CC`: Path to the Android NDK C++ compiler. Required for `.so` shared library builds and Mali targets.
- `ANDROID_NDK`: Android NDK root path (used by `prepare_libs.py`).
Quick Install
# Install Python dependencies on host
pip install mlc-llm
# Set NDK compiler for cross-compilation
export TVM_NDK_CC=/path/to/android-ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android24-clang++
# Package model for Android
python -m mlc_llm package --device android
# Deploy weights to device via ADB
python -m mlc_llm bundle_weight --device android
Code Evidence
OpenCL memory floor for Android from `config.cc:33-41`:
// Since the memory size returned by the OpenCL runtime is smaller than the actual available
// memory space, we set a best available space so that MLC LLM can run 7B or 8B models on Android
// with OpenCL.
if (device.device_type == kDLOpenCL) {
int64_t min_size_bytes = 5LL * 1024 * 1024 * 1024; // Minimum size is 5 GB
gpu_size_bytes = std::max(gpu_size_bytes, min_size_bytes);
}
Android target presets from `auto_target.py:409-441`:
"android:generic": {
"target": {
"kind": "opencl",
"host": {"kind": "llvm", "mtriple": "aarch64-linux-android"},
},
"build": _build_android,
},
"android:adreno": {
"target": {
"kind": "opencl",
"device": "adreno",
"max_threads_per_block": 512,
"host": {"kind": "llvm", "mtriple": "aarch64-linux-android"},
},
"build": _build_android,
},
NDK-dependent Mali build from `auto_target.py:271-287`:
def _build_mali():
def build(mod, args, pipeline=None):
mod = relax.build(mod, target=args.target, relax_pipeline=pipeline, system_lib=True)
if "TVM_NDK_CC" in os.environ:
mod.export_library(str(output), fcompile=ndk.create_shared)
else:
mod.export_library(str(output))
return build
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Insufficient GPU memory` on Android | OpenCL reports less memory than available | System applies 5GB floor automatically; ensure device has >= 6GB RAM |
| Cross-compilation failure | `TVM_NDK_CC` not set | Export `TVM_NDK_CC` pointing to NDK clang++ |
| ADB connection failed | Device not authorized | Run `adb devices` and accept USB debugging prompt on device |
Compatibility Notes
- OpenCL Memory Reporting: Android OpenCL runtime under-reports available GPU memory. MLC-LLM applies a 5GB minimum floor to work around this.
- Adreno vs Generic: Use `android:adreno` target for Qualcomm Snapdragon devices (higher thread limits). Use `android:generic` for other devices.
- Mali GPUs: Supported via OpenCL with NDK cross-compilation. Requires `TVM_NDK_CC` environment variable.
- System Library Mode: Android builds use system library mode (`.tar`) by default. Use `android:adreno-so` for shared library (`.so`) builds.
- FlashInfer/cuBLAS: Not available on Android. Only TIR-based kernels are used.