Implementation:Vllm project Vllm Snapshot Download Draft
| Knowledge Sources | |
|---|---|
| Domains | Model Management, Speculative Decoding, Artifact Resolution |
| Last Updated | 2026-02-08 13:00 GMT |
Overview
Concrete tool for downloading draft model or EAGLE checkpoint weights from Hugging Face Hub provided by the huggingface_hub library.
Description
The huggingface_hub.snapshot_download function downloads an entire model repository snapshot from Hugging Face Hub to a local cache directory and returns the local path. In the vLLM speculative decoding workflow, this function (or vLLM's internal equivalent) is used to acquire EAGLE checkpoints or draft model weights before they can be referenced in the speculative_config. For methods that do not require external weights (n-gram, MTP), this step is skipped entirely.
vLLM also accepts Hugging Face Hub model identifiers directly in the speculative_config["model"] field. When a Hub identifier is provided (e.g., "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B"), vLLM resolves and downloads the model internally during engine initialization. Explicit use of snapshot_download is useful when pre-caching models or when working in environments with restricted network access.
Usage
Use this function to pre-download EAGLE checkpoints or draft model weights before constructing the vLLM engine. This is especially useful for:
- Pre-warming model caches in production deployments
- Downloading models in environments where the inference process lacks network access
- Verifying model availability before starting long-running inference jobs
Code Reference
Source Location
- Repository: huggingface_hub (external library)
- File:
huggingface_hub/file_download.py(upstream); usage example atexamples/offline_inference/spec_decode.py
Signature
def snapshot_download(
repo_id: str,
*,
repo_type: str | None = None,
revision: str | None = None,
cache_dir: str | Path | None = None,
local_dir: str | Path | None = None,
token: str | bool | None = None,
ignore_patterns: str | list[str] | None = None,
allow_patterns: str | list[str] | None = None,
) -> str:
"""Download a full snapshot of a repo from the Hub.
Returns the local directory path where the snapshot was downloaded.
"""
...
Import
from huggingface_hub import snapshot_download
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repo_id | str |
Yes | The Hugging Face Hub repository identifier (e.g., "yuhuili/EAGLE3-LLaMA3.1-Instruct-8B" for EAGLE3, or "meta-llama/Llama-3.2-1B-Instruct" for a draft model).
|
| repo_type | str or None |
No | Type of repository. Defaults to "model".
|
| revision | str or None |
No | Git revision (branch, tag, or commit hash) to download. Defaults to the main branch. |
| cache_dir | str or Path or None |
No | Directory for caching downloaded files. Defaults to the Hugging Face cache directory (~/.cache/huggingface/hub).
|
| local_dir | str or Path or None |
No | If set, files are placed into this directory instead of the cache structure. |
| token | str or bool or None |
No | Authentication token for private or gated repositories. |
Outputs
| Name | Type | Description |
|---|---|---|
| local_path | str |
The local filesystem path to the downloaded snapshot directory. This path is passed as the "model" value in the speculative_config dictionary.
|
Usage Examples
Download EAGLE Checkpoint
from huggingface_hub import snapshot_download
# Download EAGLE head for LLaMA 3.1 8B Instruct
eagle_path = snapshot_download("yuhuili/EAGLE-LLaMA3.1-Instruct-8B")
speculative_config = {
"method": "eagle",
"model": eagle_path,
"num_speculative_tokens": 3,
}
Download EAGLE3 Checkpoint
from huggingface_hub import snapshot_download
# Download EAGLE3 head for LLaMA 3.1 8B Instruct
eagle3_path = snapshot_download("yuhuili/EAGLE3-LLaMA3.1-Instruct-8B")
speculative_config = {
"method": "eagle3",
"model": eagle3_path,
"num_speculative_tokens": 3,
}
Download Draft Model
from huggingface_hub import snapshot_download
# Download a smaller model from the same family
draft_path = snapshot_download("meta-llama/Llama-3.2-1B-Instruct")
speculative_config = {
"method": "draft_model",
"model": draft_path,
"num_speculative_tokens": 3,
}
N-gram and MTP: No Download Needed
# N-gram requires no additional model download
speculative_config_ngram = {
"method": "ngram",
"num_speculative_tokens": 3,
"prompt_lookup_max": 5,
"prompt_lookup_min": 2,
}
# MTP uses the target model's built-in heads; no download needed
speculative_config_mtp = {
"method": "mtp",
"num_speculative_tokens": 2,
}