Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Mlc ai Mlc llm Metal macOS iOS Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Mobile, GPU_Acceleration
Last Updated 2026-02-09 19:00 GMT

Overview

Apple Metal GPU environment for macOS desktop and iOS on-device LLM inference, using Xcode toolchain for compilation and static library linking.

Description

This environment enables LLM inference on Apple Silicon (M1/M2/M3/M4) and iOS devices via the Metal GPU backend. For macOS, models are compiled to `.dylib` shared libraries. For iOS, models are compiled to `.tar` static archives that are linked into Xcode projects via the MLCSwift framework. The Metal backend has a thread warp size of 1 and max shared memory of 32KB per block, which affects kernel scheduling. The environment uses TIR-based PagedKVCache (FlashInfer is not available on Metal).

Usage

Use this environment when deploying LLM models to macOS desktops or iOS devices (iPhone/iPad). It is required for the Mobile Deployment workflow and for running MLCEngine on Apple platforms.

System Requirements

Category Requirement Notes
OS macOS 13+ (Ventura) / iOS 16+ Apple Silicon (M1+) recommended for macOS
Hardware Apple GPU (Metal-capable) M1/M2/M3/M4 for macOS; A14+ for iOS
Toolchain Xcode 14+ Required for iOS builds and Metal shader compilation
Disk 5GB+ For model weights and compiled libraries

Dependencies

System Packages

  • `xcode` (Apple Xcode with Metal SDK)
  • `cmake` < 4.0
  • `git`

iOS Build Dependencies

  • MLCSwift framework (included in repository at `ios/MLCSwift/`)
  • `prepare_libs.sh` script (builds static libraries for device or simulator)

Python Packages

  • `apache-tvm-ffi` (TVM FFI bindings)
  • `torch` (for weight conversion)
  • `transformers`
  • `safetensors`

Credentials

No special credentials required. Apple Developer account needed for iOS device deployment.

Quick Install

# For macOS compilation
pip install mlc-llm

# For iOS: build static libraries
cd ios && ./prepare_libs.sh

# For iOS simulator target
cd ios && ./prepare_libs.sh --simulator

Code Evidence

Metal target preset from `auto_target.py:394-407`:

"iphone:generic": {
    "target": {
        "kind": "metal",
        "max_threads_per_block": 256,
        "max_shared_memory_per_block": 32768,
        "thread_warp_size": 1,
        "libs": ["iphoneos"],
        "host": {
            "kind": "llvm",
            "mtriple": "arm64-apple-darwin",
        },
    },
    "build": _build_iphone,
},

Metal KV cache capacity limit from `config.cc:746-751`:

if (device.device_type == DLDeviceType::kDLMetal) {
    // NOTE: Metal runtime has severe performance issues with large buffers.
    // To work around the issue, we limit the KV cache capacity to 32768.
    model_max_total_sequence_length =
        std::min(model_max_total_sequence_length, static_cast<int64_t>(32768));
}

iOS build function from `auto_target.py:161-182`:

def _build_iphone():
    @register_global_func("tvm_callback_metal_compile", override=True)
    def compile_metal(src, target):
        if target.libs:
            return xcode.compile_metal(src, sdk=target.libs[0])
        return xcode.compile_metal(src)

    def build(mod, args, pipeline=None):
        output = args.output
        mod = _add_system_lib_prefix(mod, args.system_lib_prefix, is_system_lib=True)
        assert output.suffix == ".tar"
        relax.build(mod, target=args.target, relax_pipeline=pipeline,
                    system_lib=True).export_library(str(output), fcompile=tar.tar)

    return build

Common Errors

Error Message Cause Solution
Metal shader compilation failure Xcode SDK mismatch Ensure Xcode Command Line Tools are installed: `xcode-select --install`
KV cache limited to 32768 tokens Metal large buffer performance workaround This is intentional; use CUDA for larger context windows
`--system-lib-prefix is not specified` Missing prefix for static library build Pass `--system-lib-prefix` flag or let auto-detection handle it

Compatibility Notes

  • KV Cache Limit: Metal runtime has severe performance issues with large buffers. MLC-LLM automatically caps KV cache capacity at 32768 tokens on Metal devices.
  • FlashInfer: Not available on Metal. The TIR-based PagedKVCache is used instead.
  • Thread Warp Size: Metal uses warp size of 1 (vs CUDA's 32), affecting kernel scheduling.
  • Optimization Flags: cuBLAS, CUTLASS, CUDA graphs, and FlashInfer are all disabled on Metal. Only TIR-based kernels are used.
  • iOS Simulator: Use `prepare_libs.sh --simulator` for x86_64 simulator builds.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment