Implementation:Mlc ai Mlc llm Download and cache mlc weights
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Deployment, Data_Management |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for obtaining pre-trained model weights from remote HuggingFace repositories with local caching and MD5 integrity validation, provided by MLC-LLM.
Description
The download_and_cache_mlc_weights function handles the complete lifecycle of downloading MLC-format model weights from HuggingFace. It parses a model URL (either HF://user/repo or https://huggingface.co/user/repo), checks a multi-level cache hierarchy (read-only caches, writable cache), and if the weights are not already cached, performs a parallel download. The download process first clones the Git repository metadata (skipping large binary files), reads the tensor-cache.json manifest to discover weight shards, then downloads each shard in parallel using a process pool with MD5 checksum verification. The completed download is atomically moved from a temporary directory to the cache location. The function respects the MLC_DOWNLOAD_CACHE_POLICY environment variable which supports ON, OFF, REDO, and READONLY modes.
Usage
Use this function when you need to programmatically download pre-quantized MLC model weights before running weight conversion or starting an inference engine. It is the primary entry point for fetching weights in the MLC-LLM compilation pipeline.
Code Reference
Source Location
- Repository: MLC-LLM
- File:
python/mlc_llm/support/download_cache.py(lines 127-199)
Signature
def download_and_cache_mlc_weights(
model_url: str,
num_processes: int = 4,
force_redo: Optional[bool] = None,
) -> Path:
"""Download weights for a model from the HuggingFace Git LFS repo."""
Import
from mlc_llm.support.download_cache import download_and_cache_mlc_weights
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_url | str | Yes | The HuggingFace model URL. Must start with HF:// or https://huggingface.co/ and follow the format prefix/user/repo. Example: HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC.
|
| num_processes | int | No (default: 4) | Number of parallel worker processes to use when downloading weight shard files. Controls the concurrency of the ProcessPoolExecutor.
|
| force_redo | Optional[bool] | No (default: None) | Whether to force re-downloading weights even if they already exist in the cache. When None, the behavior is determined by the MLC_DOWNLOAD_CACHE_POLICY environment variable -- if the policy is REDO, existing cache entries are deleted and re-downloaded.
|
Outputs
| Name | Type | Description |
|---|---|---|
| return value | Path | The local filesystem path to the directory containing the downloaded and cached model weights. This directory contains the weight shard files, tensor-cache.json, mlc-chat-config.json, and other model artifacts.
|
Exceptions
| Exception | Condition |
|---|---|
| RuntimeError | Raised when MLC_DOWNLOAD_CACHE_POLICY is OFF and a download would be required, or when the policy is READONLY and no cache entry is found.
|
| ValueError | Raised when the model_url does not conform to the expected format (HF://user/repo or https://huggingface.co/user/repo), or when an MD5 checksum mismatch is detected for a downloaded file.
|
Usage Examples
Basic Usage
from mlc_llm.support.download_cache import download_and_cache_mlc_weights
# Download model weights from HuggingFace with default settings
model_path = download_and_cache_mlc_weights("HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC")
print(f"Weights cached at: {model_path}")
Parallel Download with Custom Workers
from mlc_llm.support.download_cache import download_and_cache_mlc_weights
# Use 8 parallel processes for faster download on high-bandwidth connections
model_path = download_and_cache_mlc_weights(
model_url="HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC",
num_processes=8,
)
Force Re-download
from mlc_llm.support.download_cache import download_and_cache_mlc_weights
# Force re-download even if weights are already cached
model_path = download_and_cache_mlc_weights(
model_url="HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC",
force_redo=True,
)