Principle:Ggml org Llama cpp Model Distribution
| Field | Value |
|---|---|
| Principle Name | Model Distribution |
| Category | Model Publishing |
| Scope | Distributing converted models to model hubs |
| Status | Active |
Overview
Description
After a model has been converted to GGUF format and verified for correctness, the final step in the conversion pipeline is distribution -- making the converted model available to others. Distribution typically involves uploading the GGUF file to a model hub (primarily HuggingFace Hub) where it can be discovered, downloaded, and used for inference.
Model distribution addresses several concerns:
- Discoverability: Uploaded models should be associated with metadata (model card, tags, file names) that allows users to find them through search and filtering.
- Naming conventions: GGUF files should follow established naming patterns that communicate the model identity, quantization type, and other relevant parameters. This allows users to select the appropriate variant for their hardware constraints.
- Repository organization: A single HuggingFace repository may host multiple GGUF variants of the same base model (e.g., f16, q8_0, q4_K_M). Consistent file naming within the repository helps users identify the right file.
- Authentication: Uploading to HuggingFace Hub requires authentication via an API token. The token must have write permissions for the target repository.
- Integrity: The hub should verify file integrity upon upload. HuggingFace Hub computes checksums automatically, but uploaders should verify their local files before uploading to avoid distributing corrupted models.
Usage
The distribution step follows this workflow:
- Verify the converted GGUF file passes all quality checks (logit comparison, token verification)
- Choose or create a target HuggingFace repository
- Authenticate with the hub (set
HF_TOKENenvironment variable) - Upload the GGUF file with an appropriate filename
- Verify the upload by checking the repository page
Theoretical Basis
Model Registry Principles
Model distribution follows the same principles as software artifact registries (npm, PyPI, Docker Hub):
Immutable artifacts: Once a model file is uploaded and referenced by users, it should not be silently replaced. Version control (via Git-based repositories on HuggingFace Hub) ensures that every revision is preserved and previous versions remain accessible.
Metadata co-location: The model file alone is insufficient for users to understand its provenance, capabilities, and limitations. A model card (README.md) should accompany the upload, documenting the base model, conversion parameters, quantization type, and any known limitations.
Access control layers: Not all models should be publicly accessible. Distribution mechanisms should support both public uploads and restricted access (gated models, private repositories) depending on the model's license terms.
File Naming Conventions
The GGUF community has adopted informal naming conventions that encode key parameters in the filename:
{model-name}-{quantization}.gguf
For example:
llama-3.1-8b-instruct-f16.gguf-- float16 conversionllama-3.1-8b-instruct-q8_0.gguf-- 8-bit quantizedllama-3.1-8b-instruct-q4_K_M.gguf-- 4-bit K-quant medium
These names allow users to quickly identify the trade-off between file size, memory requirements, and inference quality.
Upload Atomicity
Large model files (tens of gigabytes) require careful upload handling. The HuggingFace Hub API supports:
- Resumable uploads: If an upload is interrupted, it can be resumed without re-uploading the entire file.
- Commit-based uploads: Each upload creates a Git commit in the repository, providing an audit trail and the ability to revert.
- LFS storage: Files larger than 10MB are automatically stored via Git LFS, ensuring the repository remains manageable.