Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vllm project Vllm Snapshot Download Draft

From Leeroopedia


Knowledge Sources
Domains Model Management, Speculative Decoding, Artifact Resolution
Last Updated 2026-02-08 13:00 GMT

Overview

Concrete tool for downloading draft model or EAGLE checkpoint weights from Hugging Face Hub provided by the huggingface_hub library.

Description

The huggingface_hub.snapshot_download function downloads an entire model repository snapshot from Hugging Face Hub to a local cache directory and returns the local path. In the vLLM speculative decoding workflow, this function (or vLLM's internal equivalent) is used to acquire EAGLE checkpoints or draft model weights before they can be referenced in the speculative_config. For methods that do not require external weights (n-gram, MTP), this step is skipped entirely.

vLLM also accepts Hugging Face Hub model identifiers directly in the speculative_config["model"] field. When a Hub identifier is provided (e.g., "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B"), vLLM resolves and downloads the model internally during engine initialization. Explicit use of snapshot_download is useful when pre-caching models or when working in environments with restricted network access.

Usage

Use this function to pre-download EAGLE checkpoints or draft model weights before constructing the vLLM engine. This is especially useful for:

  • Pre-warming model caches in production deployments
  • Downloading models in environments where the inference process lacks network access
  • Verifying model availability before starting long-running inference jobs

Code Reference

Source Location

  • Repository: huggingface_hub (external library)
  • File: huggingface_hub/file_download.py (upstream); usage example at examples/offline_inference/spec_decode.py

Signature

def snapshot_download(
    repo_id: str,
    *,
    repo_type: str | None = None,
    revision: str | None = None,
    cache_dir: str | Path | None = None,
    local_dir: str | Path | None = None,
    token: str | bool | None = None,
    ignore_patterns: str | list[str] | None = None,
    allow_patterns: str | list[str] | None = None,
) -> str:
    """Download a full snapshot of a repo from the Hub.

    Returns the local directory path where the snapshot was downloaded.
    """
    ...

Import

from huggingface_hub import snapshot_download

I/O Contract

Inputs

Name Type Required Description
repo_id str Yes The Hugging Face Hub repository identifier (e.g., "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B" for EAGLE3, or "meta-llama/Llama-3.2-1B-Instruct" for a draft model).
repo_type str or None No Type of repository. Defaults to "model".
revision str or None No Git revision (branch, tag, or commit hash) to download. Defaults to the main branch.
cache_dir str or Path or None No Directory for caching downloaded files. Defaults to the Hugging Face cache directory (~/.cache/huggingface/hub).
local_dir str or Path or None No If set, files are placed into this directory instead of the cache structure.
token str or bool or None No Authentication token for private or gated repositories.

Outputs

Name Type Description
local_path str The local filesystem path to the downloaded snapshot directory. This path is passed as the "model" value in the speculative_config dictionary.

Usage Examples

Download EAGLE Checkpoint

from huggingface_hub import snapshot_download

# Download EAGLE head for LLaMA 3.1 8B Instruct
eagle_path = snapshot_download("yuhuili/EAGLE-LLaMA3.1-Instruct-8B")

speculative_config = {
    "method": "eagle",
    "model": eagle_path,
    "num_speculative_tokens": 3,
}

Download EAGLE3 Checkpoint

from huggingface_hub import snapshot_download

# Download EAGLE3 head for LLaMA 3.1 8B Instruct
eagle3_path = snapshot_download("yuhuili/EAGLE3-LLaMA3.1-Instruct-8B")

speculative_config = {
    "method": "eagle3",
    "model": eagle3_path,
    "num_speculative_tokens": 3,
}

Download Draft Model

from huggingface_hub import snapshot_download

# Download a smaller model from the same family
draft_path = snapshot_download("meta-llama/Llama-3.2-1B-Instruct")

speculative_config = {
    "method": "draft_model",
    "model": draft_path,
    "num_speculative_tokens": 3,
}

N-gram and MTP: No Download Needed

# N-gram requires no additional model download
speculative_config_ngram = {
    "method": "ngram",
    "num_speculative_tokens": 3,
    "prompt_lookup_max": 5,
    "prompt_lookup_min": 2,
}

# MTP uses the target model's built-in heads; no download needed
speculative_config_mtp = {
    "method": "mtp",
    "num_speculative_tokens": 2,
}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment