Implementation:Vllm project Vllm Snapshot Download Draft

Knowledge Sources	vLLM vLLM Docs Hugging Face Hub API
Domains	Model Management, Speculative Decoding, Artifact Resolution
Last Updated	2026-02-08 13:00 GMT

Overview

Concrete tool for downloading draft model or EAGLE checkpoint weights from Hugging Face Hub provided by the huggingface_hub library.

Description

The huggingface_hub.snapshot_download function downloads an entire model repository snapshot from Hugging Face Hub to a local cache directory and returns the local path. In the vLLM speculative decoding workflow, this function (or vLLM's internal equivalent) is used to acquire EAGLE checkpoints or draft model weights before they can be referenced in the speculative_config. For methods that do not require external weights (n-gram, MTP), this step is skipped entirely.

vLLM also accepts Hugging Face Hub model identifiers directly in the speculative_config["model"] field. When a Hub identifier is provided (e.g., "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B"), vLLM resolves and downloads the model internally during engine initialization. Explicit use of snapshot_download is useful when pre-caching models or when working in environments with restricted network access.

Usage

Use this function to pre-download EAGLE checkpoints or draft model weights before constructing the vLLM engine. This is especially useful for:

Pre-warming model caches in production deployments
Downloading models in environments where the inference process lacks network access
Verifying model availability before starting long-running inference jobs

Code Reference

Source Location

Repository: huggingface_hub (external library)
File: huggingface_hub/file_download.py (upstream); usage example at examples/offline_inference/spec_decode.py

Signature

def snapshot_download(
    repo_id: str,
    *,
    repo_type: str | None = None,
    revision: str | None = None,
    cache_dir: str | Path | None = None,
    local_dir: str | Path | None = None,
    token: str | bool | None = None,
    ignore_patterns: str | list[str] | None = None,
    allow_patterns: str | list[str] | None = None,
) -> str:
    """Download a full snapshot of a repo from the Hub.

    Returns the local directory path where the snapshot was downloaded.
    """
    ...

Import

from huggingface_hub import snapshot_download

I/O Contract

Inputs

Name	Type	Required	Description
repo_id	`str`	Yes	The Hugging Face Hub repository identifier (e.g., `"yuhuili/EAGLE3-LLaMA3.1-Instruct-8B"` for EAGLE3, or `"meta-llama/Llama-3.2-1B-Instruct"` for a draft model).
repo_type	`str or None`	No	Type of repository. Defaults to `"model"`.
revision	`str or None`	No	Git revision (branch, tag, or commit hash) to download. Defaults to the main branch.
cache_dir	`str or Path or None`	No	Directory for caching downloaded files. Defaults to the Hugging Face cache directory (`~/.cache/huggingface/hub`).
local_dir	`str or Path or None`	No	If set, files are placed into this directory instead of the cache structure.
token	`str or bool or None`	No	Authentication token for private or gated repositories.

Outputs

Name	Type	Description
local_path	`str`	The local filesystem path to the downloaded snapshot directory. This path is passed as the `"model"` value in the `speculative_config` dictionary.

Usage Examples

Download EAGLE Checkpoint

from huggingface_hub import snapshot_download

# Download EAGLE head for LLaMA 3.1 8B Instruct
eagle_path = snapshot_download("yuhuili/EAGLE-LLaMA3.1-Instruct-8B")

speculative_config = {
    "method": "eagle",
    "model": eagle_path,
    "num_speculative_tokens": 3,
}

Download EAGLE3 Checkpoint

from huggingface_hub import snapshot_download

# Download EAGLE3 head for LLaMA 3.1 8B Instruct
eagle3_path = snapshot_download("yuhuili/EAGLE3-LLaMA3.1-Instruct-8B")

speculative_config = {
    "method": "eagle3",
    "model": eagle3_path,
    "num_speculative_tokens": 3,
}

Download Draft Model

from huggingface_hub import snapshot_download

# Download a smaller model from the same family
draft_path = snapshot_download("meta-llama/Llama-3.2-1B-Instruct")

speculative_config = {
    "method": "draft_model",
    "model": draft_path,
    "num_speculative_tokens": 3,
}

N-gram and MTP: No Download Needed

# N-gram requires no additional model download
speculative_config_ngram = {
    "method": "ngram",
    "num_speculative_tokens": 3,
    "prompt_lookup_max": 5,
    "prompt_lookup_min": 2,
}

# MTP uses the target model's built-in heads; no download needed
speculative_config_mtp = {
    "method": "mtp",
    "num_speculative_tokens": 2,
}

Related Pages

Implements Principle

Principle:Vllm_project_Vllm_Draft_Model_Acquisition

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment