Implementation:Mlc ai Mlc llm Download and cache mlc weights

Knowledge Sources	MLC-LLM
Domains	Deep_Learning, Model_Deployment, Data_Management
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for obtaining pre-trained model weights from remote HuggingFace repositories with local caching and MD5 integrity validation, provided by MLC-LLM.

Description

The download_and_cache_mlc_weights function handles the complete lifecycle of downloading MLC-format model weights from HuggingFace. It parses a model URL (either HF://user/repo or https://huggingface.co/user/repo), checks a multi-level cache hierarchy (read-only caches, writable cache), and if the weights are not already cached, performs a parallel download. The download process first clones the Git repository metadata (skipping large binary files), reads the tensor-cache.json manifest to discover weight shards, then downloads each shard in parallel using a process pool with MD5 checksum verification. The completed download is atomically moved from a temporary directory to the cache location. The function respects the MLC_DOWNLOAD_CACHE_POLICY environment variable which supports ON, OFF, REDO, and READONLY modes.

Usage

Use this function when you need to programmatically download pre-quantized MLC model weights before running weight conversion or starting an inference engine. It is the primary entry point for fetching weights in the MLC-LLM compilation pipeline.

Code Reference

Source Location

Repository: MLC-LLM
File: python/mlc_llm/support/download_cache.py (lines 127-199)

Signature

def download_and_cache_mlc_weights(
    model_url: str,
    num_processes: int = 4,
    force_redo: Optional[bool] = None,
) -> Path:
    """Download weights for a model from the HuggingFace Git LFS repo."""

Import

from mlc_llm.support.download_cache import download_and_cache_mlc_weights

I/O Contract

Inputs

Name	Type	Required	Description
model_url	str	Yes	The HuggingFace model URL. Must start with `HF://` or `https://huggingface.co/` and follow the format `prefix/user/repo`. Example: `HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC`.
num_processes	int	No (default: 4)	Number of parallel worker processes to use when downloading weight shard files. Controls the concurrency of the `ProcessPoolExecutor`.
force_redo	Optional[bool]	No (default: None)	Whether to force re-downloading weights even if they already exist in the cache. When `None`, the behavior is determined by the `MLC_DOWNLOAD_CACHE_POLICY` environment variable -- if the policy is `REDO`, existing cache entries are deleted and re-downloaded.

Outputs

Name	Type	Description
return value	Path	The local filesystem path to the directory containing the downloaded and cached model weights. This directory contains the weight shard files, `tensor-cache.json`, `mlc-chat-config.json`, and other model artifacts.

Exceptions

Exception	Condition
RuntimeError	Raised when `MLC_DOWNLOAD_CACHE_POLICY` is `OFF` and a download would be required, or when the policy is `READONLY` and no cache entry is found.
ValueError	Raised when the `model_url` does not conform to the expected format (`HF://user/repo` or `https://huggingface.co/user/repo`), or when an MD5 checksum mismatch is detected for a downloaded file.

Usage Examples

Basic Usage

from mlc_llm.support.download_cache import download_and_cache_mlc_weights

# Download model weights from HuggingFace with default settings
model_path = download_and_cache_mlc_weights("HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC")
print(f"Weights cached at: {model_path}")

Parallel Download with Custom Workers

from mlc_llm.support.download_cache import download_and_cache_mlc_weights

# Use 8 parallel processes for faster download on high-bandwidth connections
model_path = download_and_cache_mlc_weights(
    model_url="HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC",
    num_processes=8,
)

Force Re-download

from mlc_llm.support.download_cache import download_and_cache_mlc_weights

# Force re-download even if weights are already cached
model_path = download_and_cache_mlc_weights(
    model_url="HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC",
    force_redo=True,
)

Related Pages

Implements Principle

Principle:Mlc_ai_Mlc_llm_Model_Weight_Acquisition

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment