Implementation:Mlc ai Mlc llm Auto Device

Overview

python/mlc_llm/support/auto_device.py provides automatic detection of locally available compute devices (GPUs and CPU) for the MLC LLM runtime. It probes the system for supported device types by spawning a subprocess to check device availability, caches the results, and returns an appropriate TVM Device object. This module is used whenever a user specifies "auto" as the device hint.

Location

File: python/mlc_llm/support/auto_device.py
Module: mlc_llm.support.auto_device
Lines: 95

Module-Level Constants

FOUND = green("Found")
NOT_FOUND = red("Not found")
AUTO_DETECT_DEVICES = ["cuda", "rocm", "metal", "vulkan", "opencl", "cpu"]
_RESULT_CACHE: Dict[str, bool] = {}

AUTO_DETECT_DEVICES: The ordered list of device types to probe during automatic detection. The order reflects priority: CUDA and ROCm (discrete GPUs) are checked first, followed by Metal and Vulkan, then OpenCL, and finally CPU as a fallback.
_RESULT_CACHE: A module-level dictionary caching device existence results to avoid redundant subprocess calls within a session.

Functions

detect_device

def detect_device(device_hint: str) -> Optional[Device]:

The primary entry point for device detection.

When device_hint is "auto": Iterates through AUTO_DETECT_DEVICES in order, probing each device type at index 0. Returns the first device found.

if device_hint == "auto":
    device = None
    for device_type in AUTO_DETECT_DEVICES:
        cur_device = tvm.device(device_type=device_type, index=0)
        if _device_exists(cur_device):
            if device is None:
                device = cur_device
    if device is None:
        logger.info("%s: No available device detected", NOT_FOUND)
        return None
    logger.info("Using device: %s", bold(device2str(device)))
    return device

Note that the loop continues checking all device types even after finding one (to populate the cache with log messages), but only the first found device is returned.

When device_hint is a specific device: Creates a TVM device from the hint string and validates its existence. Raises ValueError if the device name is invalid or the device is not found locally.

try:
    device = tvm.device(device_hint)
except Exception as err:
    raise ValueError(f"Invalid device name: {device_hint}") from err
if not _device_exists(device):
    raise ValueError(f"Device is not found on your local environment: {device_hint}")
return device

device2str

def device2str(device: Device) -> str:

Converts a TVM Device object to a human-readable string in the format "device_type:index" (e.g., "cuda:0").

Implementation:

return f"{tvm.runtime.Device._DEVICE_TYPE_TO_NAME[device.dlpack_device_type()]}:{device.index}"

Uses TVM's internal _DEVICE_TYPE_TO_NAME mapping to convert the DLPack device type enum to a string name.

_device_exists (Private)

def _device_exists(device: Device) -> bool:

Checks whether a specific device exists on the local machine by spawning a subprocess.

Process:

Checks the _RESULT_CACHE for a cached result.
If not cached, runs a subprocess command:

cmd = [sys.executable, "-m", "mlc_llm.cli.check_device", device_type]

The subprocess is run via subprocess.run with capture_output=True and the current environment.

Output parsing: The subprocess output is expected to contain lines prefixed with "check_device:". The function extracts the content after this prefix, which is a comma-separated list of available device indices.

subproc_outputs = [
    line[len(prefix):].strip()
    for line in subprocess.run(cmd, capture_output=True, text=True, check=False, env=os.environ)
    .stdout.strip().splitlines()
    if line.startswith(prefix)
]

Cache population: For each discovered device index, the result is cached as True in _RESULT_CACHE with the key "device_type:index". For CPU devices (kDLCPU), only the first index is cached (via break).

Error handling: If no "check_device:" output lines are found, an error is logged asking the user to report the issue with the subprocess command.

If the device string is still not in the cache after processing, it is cached as False.

Detection Flow

The overall device detection flow is:

User provides "auto" or a specific device string.
For "auto", iterate through ["cuda", "rocm", "metal", "vulkan", "opencl", "cpu"].
For each device type, spawn python -m mlc_llm.cli.check_device <type>.
Parse output for available device indices.
Cache results and return the first available device.

Dependencies

tvm: For tvm.device() creation and Device type constants.
tvm_ffi.DLDeviceType: For the kDLCPU constant used in CPU-specific logic.
subprocess: For spawning the device check process.
os: For passing the current environment to the subprocess.
sys: For getting the current Python executable path.
mlc_llm.support.logging: Custom logging.
mlc_llm.support.style: For styled terminal output (bold, green, red).

Design Notes

Device detection is performed in a separate subprocess via mlc_llm.cli.check_device to isolate potential crashes or GPU driver issues from the main process.
The module-level _RESULT_CACHE ensures each device type is only probed once per process lifetime, avoiding expensive repeated subprocess calls.
The detection priority order places GPU backends first and CPU last, ensuring GPU acceleration is preferred when available.
The CPU special case (breaking after first index) avoids unnecessary enumeration since CPU is not typically multi-indexed in the same way as GPUs.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment