Implementation:Ggml org Llama cpp Snapshot Download
| Field | Value |
|---|---|
| Implementation Name | Snapshot Download |
| Type | External Tool Doc |
| Tool | huggingface_hub (Python library) |
| Status | Active |
Overview
Description
The llama.cpp conversion script convert_hf_to_gguf.py uses huggingface_hub.snapshot_download() to acquire model configuration and tokenizer files from HuggingFace Hub when operating in remote mode (--remote flag). In this mode, model tensors are read directly from the hub via HTTP range requests, so only non-tensor files need to be downloaded locally.
The implementation is located in the main() function of convert_hf_to_gguf.py, lines 11841-11851. When --remote is specified, the positional model argument is interpreted as a HuggingFace repository ID rather than a local directory path.
Usage
Invoke the conversion script with the --remote flag and a HuggingFace repository ID:
python convert_hf_to_gguf.py --remote --outtype f16 HuggingFaceTB/SmolLM2-1.7B-Instruct
For gated models, set the HF_TOKEN environment variable:
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
python convert_hf_to_gguf.py --remote --outtype f16 meta-llama/Llama-3.1-8B-Instruct
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
convert_hf_to_gguf.py |
11841-11851 | snapshot_download invocation in main()
|
convert_hf_to_gguf.py |
11755-11758 | --remote argument definition in parse_args()
|
Signature
The relevant code excerpt from convert_hf_to_gguf.py lines 11841-11851:
if args.remote:
hf_repo_id = args.model
from huggingface_hub import snapshot_download
allowed_patterns = ["LICENSE", "*.json", "*.md", "*.txt", "tokenizer.model"]
if args.sentence_transformers_dense_modules:
# include sentence-transformers dense modules safetensors files
allowed_patterns.append("*.safetensors")
local_dir = snapshot_download(
repo_id=hf_repo_id,
allow_patterns=allowed_patterns)
dir_model = Path(local_dir)
The underlying snapshot_download function from huggingface_hub accepts these key parameters:
| Parameter | Type | Description |
|---|---|---|
repo_id |
str |
HuggingFace repository ID (e.g., "meta-llama/Llama-3.1-8B-Instruct")
|
allow_patterns |
Optional[Union[List[str], str]] |
Glob patterns for files to include in the download |
revision |
Optional[str] |
Git revision (branch, tag, or commit hash) to download |
token |
Optional[Union[bool, str]] |
Authentication token; defaults to HF_TOKEN environment variable
|
Import
from huggingface_hub import snapshot_download
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | str |
HuggingFace repository ID (e.g., "HuggingFaceTB/SmolLM2-1.7B-Instruct")
|
| Input | list[str] |
Allowed file patterns: ["LICENSE", "*.json", "*.md", "*.txt", "tokenizer.model"]
|
| Output | str |
Local directory path containing the downloaded files |
| Side Effects | File system | Creates or populates a directory in the HuggingFace cache (~/.cache/huggingface/hub/)
|
| Side Effects | Network | Downloads files from https://huggingface.co/
|
Downloaded file patterns and their purpose:
| Pattern | Typical Files | Purpose |
|---|---|---|
*.json |
config.json, tokenizer_config.json, tokenizer.json, generation_config.json |
Model architecture config, tokenizer config, generation settings |
tokenizer.model |
tokenizer.model |
SentencePiece tokenizer binary model |
*.txt |
special_tokens_map.txt, merges.txt |
Tokenizer special tokens and BPE merge rules |
*.md |
README.md |
Model card and metadata |
LICENSE |
LICENSE |
Model license file |
*.safetensors |
model.safetensors, shard files |
Only included when --sentence-transformers-dense-modules is set
|
Usage Examples
Remote conversion of a public model (no authentication needed):
python convert_hf_to_gguf.py --remote --outtype auto HuggingFaceTB/SmolLM2-1.7B-Instruct
Remote conversion of a gated model (requires token):
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
python convert_hf_to_gguf.py --remote --outtype bf16 meta-llama/Llama-3.1-8B-Instruct
Programmatic usage of snapshot_download:
from huggingface_hub import snapshot_download
from pathlib import Path
local_dir = snapshot_download(
repo_id="HuggingFaceTB/SmolLM2-1.7B-Instruct",
allow_patterns=["LICENSE", "*.json", "*.md", "*.txt", "tokenizer.model"]
)
dir_model = Path(local_dir)
print(f"Model files downloaded to: {dir_model}")