Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm Download and cache mlc weights

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Model_Deployment, Data_Management
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for obtaining pre-trained model weights from remote HuggingFace repositories with local caching and MD5 integrity validation, provided by MLC-LLM.

Description

The download_and_cache_mlc_weights function handles the complete lifecycle of downloading MLC-format model weights from HuggingFace. It parses a model URL (either HF://user/repo or https://huggingface.co/user/repo), checks a multi-level cache hierarchy (read-only caches, writable cache), and if the weights are not already cached, performs a parallel download. The download process first clones the Git repository metadata (skipping large binary files), reads the tensor-cache.json manifest to discover weight shards, then downloads each shard in parallel using a process pool with MD5 checksum verification. The completed download is atomically moved from a temporary directory to the cache location. The function respects the MLC_DOWNLOAD_CACHE_POLICY environment variable which supports ON, OFF, REDO, and READONLY modes.

Usage

Use this function when you need to programmatically download pre-quantized MLC model weights before running weight conversion or starting an inference engine. It is the primary entry point for fetching weights in the MLC-LLM compilation pipeline.

Code Reference

Source Location

  • Repository: MLC-LLM
  • File: python/mlc_llm/support/download_cache.py (lines 127-199)

Signature

def download_and_cache_mlc_weights(
    model_url: str,
    num_processes: int = 4,
    force_redo: Optional[bool] = None,
) -> Path:
    """Download weights for a model from the HuggingFace Git LFS repo."""

Import

from mlc_llm.support.download_cache import download_and_cache_mlc_weights

I/O Contract

Inputs

Name Type Required Description
model_url str Yes The HuggingFace model URL. Must start with HF:// or https://huggingface.co/ and follow the format prefix/user/repo. Example: HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC.
num_processes int No (default: 4) Number of parallel worker processes to use when downloading weight shard files. Controls the concurrency of the ProcessPoolExecutor.
force_redo Optional[bool] No (default: None) Whether to force re-downloading weights even if they already exist in the cache. When None, the behavior is determined by the MLC_DOWNLOAD_CACHE_POLICY environment variable -- if the policy is REDO, existing cache entries are deleted and re-downloaded.

Outputs

Name Type Description
return value Path The local filesystem path to the directory containing the downloaded and cached model weights. This directory contains the weight shard files, tensor-cache.json, mlc-chat-config.json, and other model artifacts.

Exceptions

Exception Condition
RuntimeError Raised when MLC_DOWNLOAD_CACHE_POLICY is OFF and a download would be required, or when the policy is READONLY and no cache entry is found.
ValueError Raised when the model_url does not conform to the expected format (HF://user/repo or https://huggingface.co/user/repo), or when an MD5 checksum mismatch is detected for a downloaded file.

Usage Examples

Basic Usage

from mlc_llm.support.download_cache import download_and_cache_mlc_weights

# Download model weights from HuggingFace with default settings
model_path = download_and_cache_mlc_weights("HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC")
print(f"Weights cached at: {model_path}")

Parallel Download with Custom Workers

from mlc_llm.support.download_cache import download_and_cache_mlc_weights

# Use 8 parallel processes for faster download on high-bandwidth connections
model_path = download_and_cache_mlc_weights(
    model_url="HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC",
    num_processes=8,
)

Force Re-download

from mlc_llm.support.download_cache import download_and_cache_mlc_weights

# Force re-download even if weights are already cached
model_path = download_and_cache_mlc_weights(
    model_url="HF://mlc-ai/Llama-2-7b-chat-q4f16_1-MLC",
    force_redo=True,
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment