Environment:Ggml org Llama cpp Python Conversion Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Model_Conversion |
| Last Updated | 2026-02-14 22:00 GMT |
Overview
Python 3.9+ environment with PyTorch, transformers, sentencepiece, and the gguf library for converting HuggingFace models to GGUF format and converting LoRA adapters.
Description
This environment provides the Python runtime and package dependencies required for all model conversion scripts in llama.cpp. The primary scripts are convert_hf_to_gguf.py (11,934 lines, handles 100+ model architectures) and convert_lora_to_gguf.py (493 lines, converts LoRA adapters). The environment requires PyTorch for loading safetensors/bin model weights and transformers for model configuration parsing.
Usage
Use this environment for the HF-to-GGUF Model Conversion and LoRA Adapter Workflow workflows. It is the mandatory prerequisite for running convert_hf_to_gguf.py, convert_lora_to_gguf.py, model inspection scripts, and logit comparison tools.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows | Python scripts are cross-platform |
| Python | >= 3.9 | Defined in pyproject.toml |
| RAM | 2x model size | Models are loaded into RAM during conversion |
| Disk | 2x model size | Source model + output GGUF file |
Dependencies
System Packages
python3>= 3.9pip(Python package manager)
Python Packages
numpy~= 1.26.4sentencepiece>= 0.1.98, < 0.3.0transformers>= 4.57.1, < 5.0.0gguf>= 0.1.0protobuf>= 4.21.0, < 5.0.0torch~= 2.6.0 (standard platforms)torch>= 0.0.0.dev0 (s390x nightly builds only)
Credentials
The following environment variables may be needed for accessing gated or private models:
HF_TOKEN: HuggingFace API token (Read access) for downloading gated models like LlamaHUGGINGFACE_HUB_TOKEN: Alternative to HF_TOKEN (legacy compatibility)
Quick Install
# Install all required packages for model conversion
pip install numpy~=1.26.4 "sentencepiece>=0.1.98,<0.3.0" "transformers>=4.57.1,<5.0.0" "gguf>=0.1.0" "protobuf>=4.21.0,<5.0.0"
# Install PyTorch (CPU-only, sufficient for conversion)
pip install torch --index-url https://download.pytorch.org/whl/cpu
# Or install everything from requirements file
pip install -r requirements/requirements-convert_hf_to_gguf.txt
Code Evidence
Python version requirement from pyproject.toml:8:
[tool.poetry.dependencies]
python = ">=3.9"
numpy = "^1.25.0"
Core Python dependencies from requirements/requirements-convert_legacy_llama.txt:1-7:
numpy~=1.26.4
sentencepiece>=0.1.98,<0.3.0
transformers>=4.57.1,<5.0.0
gguf>=0.1.0
protobuf>=4.21.0,<5.0.0
Platform-specific PyTorch from requirements/requirements-convert_hf_to_gguf.txt:4-9:
## Embedding Gemma requires PyTorch 2.6.0 or later
torch~=2.6.0; platform_machine != "s390x"
# torch s390x packages can only be found from nightly builds
torch>=0.0.0.dev0; platform_machine == "s390x"
HuggingFace token usage from gguf-py/gguf/utility.py:268-269:
token = os.environ.get("HF_TOKEN")
headers = {"Authorization": f"Bearer {token}"} if token else {}
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ModuleNotFoundError: No module named 'transformers' |
Missing Python dependencies | Run pip install -r requirements/requirements-convert_hf_to_gguf.txt
|
ImportError: protobuf |
Protobuf version mismatch | Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python and reinstall protobuf
|
torch not compiled with CUDA |
Wrong PyTorch build | CPU-only PyTorch is sufficient for conversion; ignore this warning |
Access denied on gated model |
Missing HuggingFace token | Set HF_TOKEN env var with a token that has access to the model
|
Compatibility Notes
- s390x (IBM Z): Requires nightly PyTorch builds from
https://download.pytorch.org/whl/nightly. Standard PyTorch wheels are not available. - CPU-only PyTorch: Sufficient for all conversion tasks. GPU PyTorch is not needed for converting models to GGUF.
- NO_LOCAL_GGUF: Setting this environment variable skips the local gguf package import and uses the pip-installed version instead.
- MODEL_ENDPOINT: Defaults to
https://huggingface.co/. Can be overridden to point to alternative model repositories.
Related Pages
- Implementation:Ggml_org_Llama_cpp_Conversion_Pip_Dependencies
- Implementation:Ggml_org_Llama_cpp_ModelBase_Write
- Implementation:Ggml_org_Llama_cpp_Compare_Logits
- Implementation:Ggml_org_Llama_cpp_Inspect_Org_Model
- Implementation:Ggml_org_Llama_cpp_HF_Upload_GGUF
- Implementation:Ggml_org_Llama_cpp_Convert_LoRA_To_GGUF