Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Ggml org Llama cpp Python Conversion Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Model_Conversion
Last Updated 2026-02-14 22:00 GMT

Overview

Python 3.9+ environment with PyTorch, transformers, sentencepiece, and the gguf library for converting HuggingFace models to GGUF format and converting LoRA adapters.

Description

This environment provides the Python runtime and package dependencies required for all model conversion scripts in llama.cpp. The primary scripts are convert_hf_to_gguf.py (11,934 lines, handles 100+ model architectures) and convert_lora_to_gguf.py (493 lines, converts LoRA adapters). The environment requires PyTorch for loading safetensors/bin model weights and transformers for model configuration parsing.

Usage

Use this environment for the HF-to-GGUF Model Conversion and LoRA Adapter Workflow workflows. It is the mandatory prerequisite for running convert_hf_to_gguf.py, convert_lora_to_gguf.py, model inspection scripts, and logit comparison tools.

System Requirements

Category Requirement Notes
OS Linux, macOS, Windows Python scripts are cross-platform
Python >= 3.9 Defined in pyproject.toml
RAM 2x model size Models are loaded into RAM during conversion
Disk 2x model size Source model + output GGUF file

Dependencies

System Packages

  • python3 >= 3.9
  • pip (Python package manager)

Python Packages

  • numpy ~= 1.26.4
  • sentencepiece >= 0.1.98, < 0.3.0
  • transformers >= 4.57.1, < 5.0.0
  • gguf >= 0.1.0
  • protobuf >= 4.21.0, < 5.0.0
  • torch ~= 2.6.0 (standard platforms)
  • torch >= 0.0.0.dev0 (s390x nightly builds only)

Credentials

The following environment variables may be needed for accessing gated or private models:

  • HF_TOKEN: HuggingFace API token (Read access) for downloading gated models like Llama
  • HUGGINGFACE_HUB_TOKEN: Alternative to HF_TOKEN (legacy compatibility)

Quick Install

# Install all required packages for model conversion
pip install numpy~=1.26.4 "sentencepiece>=0.1.98,<0.3.0" "transformers>=4.57.1,<5.0.0" "gguf>=0.1.0" "protobuf>=4.21.0,<5.0.0"

# Install PyTorch (CPU-only, sufficient for conversion)
pip install torch --index-url https://download.pytorch.org/whl/cpu

# Or install everything from requirements file
pip install -r requirements/requirements-convert_hf_to_gguf.txt

Code Evidence

Python version requirement from pyproject.toml:8:

[tool.poetry.dependencies]
python = ">=3.9"
numpy = "^1.25.0"

Core Python dependencies from requirements/requirements-convert_legacy_llama.txt:1-7:

numpy~=1.26.4
sentencepiece>=0.1.98,<0.3.0
transformers>=4.57.1,<5.0.0
gguf>=0.1.0
protobuf>=4.21.0,<5.0.0

Platform-specific PyTorch from requirements/requirements-convert_hf_to_gguf.txt:4-9:

## Embedding Gemma requires PyTorch 2.6.0 or later
torch~=2.6.0; platform_machine != "s390x"
# torch s390x packages can only be found from nightly builds
torch>=0.0.0.dev0; platform_machine == "s390x"

HuggingFace token usage from gguf-py/gguf/utility.py:268-269:

token = os.environ.get("HF_TOKEN")
headers = {"Authorization": f"Bearer {token}"} if token else {}

Common Errors

Error Message Cause Solution
ModuleNotFoundError: No module named 'transformers' Missing Python dependencies Run pip install -r requirements/requirements-convert_hf_to_gguf.txt
ImportError: protobuf Protobuf version mismatch Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python and reinstall protobuf
torch not compiled with CUDA Wrong PyTorch build CPU-only PyTorch is sufficient for conversion; ignore this warning
Access denied on gated model Missing HuggingFace token Set HF_TOKEN env var with a token that has access to the model

Compatibility Notes

  • s390x (IBM Z): Requires nightly PyTorch builds from https://download.pytorch.org/whl/nightly. Standard PyTorch wheels are not available.
  • CPU-only PyTorch: Sufficient for all conversion tasks. GPU PyTorch is not needed for converting models to GGUF.
  • NO_LOCAL_GGUF: Setting this environment variable skips the local gguf package import and uses the pip-installed version instead.
  • MODEL_ENDPOINT: Defaults to https://huggingface.co/. Can be overridden to point to alternative model repositories.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment