Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Snapshot Download

From Leeroopedia
Field Value
Implementation Name Snapshot Download
Type External Tool Doc
Tool huggingface_hub (Python library)
Status Active

Overview

Description

The llama.cpp conversion script convert_hf_to_gguf.py uses huggingface_hub.snapshot_download() to acquire model configuration and tokenizer files from HuggingFace Hub when operating in remote mode (--remote flag). In this mode, model tensors are read directly from the hub via HTTP range requests, so only non-tensor files need to be downloaded locally.

The implementation is located in the main() function of convert_hf_to_gguf.py, lines 11841-11851. When --remote is specified, the positional model argument is interpreted as a HuggingFace repository ID rather than a local directory path.

Usage

Invoke the conversion script with the --remote flag and a HuggingFace repository ID:

python convert_hf_to_gguf.py --remote --outtype f16 HuggingFaceTB/SmolLM2-1.7B-Instruct

For gated models, set the HF_TOKEN environment variable:

export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
python convert_hf_to_gguf.py --remote --outtype f16 meta-llama/Llama-3.1-8B-Instruct

Code Reference

Source Location

File Lines Description
convert_hf_to_gguf.py 11841-11851 snapshot_download invocation in main()
convert_hf_to_gguf.py 11755-11758 --remote argument definition in parse_args()

Signature

The relevant code excerpt from convert_hf_to_gguf.py lines 11841-11851:

if args.remote:
    hf_repo_id = args.model
    from huggingface_hub import snapshot_download
    allowed_patterns = ["LICENSE", "*.json", "*.md", "*.txt", "tokenizer.model"]
    if args.sentence_transformers_dense_modules:
        # include sentence-transformers dense modules safetensors files
        allowed_patterns.append("*.safetensors")
    local_dir = snapshot_download(
        repo_id=hf_repo_id,
        allow_patterns=allowed_patterns)
    dir_model = Path(local_dir)

The underlying snapshot_download function from huggingface_hub accepts these key parameters:

Parameter Type Description
repo_id str HuggingFace repository ID (e.g., "meta-llama/Llama-3.1-8B-Instruct")
allow_patterns Optional[Union[List[str], str]] Glob patterns for files to include in the download
revision Optional[str] Git revision (branch, tag, or commit hash) to download
token Optional[Union[bool, str]] Authentication token; defaults to HF_TOKEN environment variable

Import

from huggingface_hub import snapshot_download

I/O Contract

Direction Type Description
Input str HuggingFace repository ID (e.g., "HuggingFaceTB/SmolLM2-1.7B-Instruct")
Input list[str] Allowed file patterns: ["LICENSE", "*.json", "*.md", "*.txt", "tokenizer.model"]
Output str Local directory path containing the downloaded files
Side Effects File system Creates or populates a directory in the HuggingFace cache (~/.cache/huggingface/hub/)
Side Effects Network Downloads files from https://huggingface.co/

Downloaded file patterns and their purpose:

Pattern Typical Files Purpose
*.json config.json, tokenizer_config.json, tokenizer.json, generation_config.json Model architecture config, tokenizer config, generation settings
tokenizer.model tokenizer.model SentencePiece tokenizer binary model
*.txt special_tokens_map.txt, merges.txt Tokenizer special tokens and BPE merge rules
*.md README.md Model card and metadata
LICENSE LICENSE Model license file
*.safetensors model.safetensors, shard files Only included when --sentence-transformers-dense-modules is set

Usage Examples

Remote conversion of a public model (no authentication needed):

python convert_hf_to_gguf.py --remote --outtype auto HuggingFaceTB/SmolLM2-1.7B-Instruct

Remote conversion of a gated model (requires token):

export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
python convert_hf_to_gguf.py --remote --outtype bf16 meta-llama/Llama-3.1-8B-Instruct

Programmatic usage of snapshot_download:

from huggingface_hub import snapshot_download
from pathlib import Path

local_dir = snapshot_download(
    repo_id="HuggingFaceTB/SmolLM2-1.7B-Instruct",
    allow_patterns=["LICENSE", "*.json", "*.md", "*.txt", "tokenizer.model"]
)
dir_model = Path(local_dir)
print(f"Model files downloaded to: {dir_model}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment