Environment:Unslothai Unsloth Llama Cpp

Knowledge Sources	Unsloth llama.cpp
Domains	Infrastructure, Model_Export, Quantization
Last Updated	2026-02-07 09:00 GMT

Overview

Build environment for llama.cpp compilation (cmake, make, gcc) required for GGUF model export and quantization.

Description

This environment provides the C/C++ build toolchain needed to compile llama.cpp, which Unsloth uses for GGUF format conversion and quantization. The build process is delegated to the `unsloth_zoo.llama_cpp` module which handles git cloning, compilation, and artifact validation. Three build targets are required: `llama-quantize`, `llama-cli`, and `llama-server`. The environment auto-detects Colab and Kaggle environments for path management.

Usage

Use this environment when calling `model.save_pretrained_gguf()` or `model.push_to_hub_gguf()` for GGUF format export. This is only needed for GGUF conversion; SafeTensors saving and Hub upload do not require llama.cpp.

System Requirements

Category	Requirement	Notes
OS	Linux	cmake/make compilation target
Hardware	CPU (GPU optional for CUDA-accelerated quantization)	Compilation is CPU-bound
Disk	5GB+	For llama.cpp source, build artifacts, and intermediate GGUF files
RAM	8GB+	Model conversion and quantization are memory-intensive

Dependencies

System Packages

`cmake` (for llama.cpp build system)
`make` (build automation)
`gcc` / `g++` (C/C++ compiler with C++17 support)
`git` (for cloning llama.cpp repository)

Python Packages

`unsloth_zoo` >= 2026.2.1 (contains `install_llama_cpp`, `check_llama_cpp`, `convert_to_gguf`, `quantize_gguf`)
`sentencepiece` (for tokenizer conversion)
`psutil` (for memory management during conversion)
All packages from Python_Transformers environment

Credentials

`HF_TOKEN`: HuggingFace API token (Write access for `push_to_hub_gguf`).

Quick Install

# Install system build tools (Ubuntu/Debian)
sudo apt-get install cmake make gcc g++ git

# Install Python dependencies
pip install unsloth "unsloth_zoo>=2026.2.1" sentencepiece psutil

# llama.cpp is auto-compiled on first GGUF save

Code Evidence

llama.cpp imports from `save.py:18-25`:

from unsloth_zoo.llama_cpp import (
    convert_to_gguf,
    quantize_gguf,
    use_local_gguf,
    install_llama_cpp,
    check_llama_cpp,
    _download_convert_hf_to_gguf,
)

Build targets from `save.py:70-74`:

LLAMA_CPP_TARGETS = [
    "llama-quantize",
    "llama-cli",
    "llama-server",
]

Environment detection from `save.py:77-81`:

keynames = "\n" + "\n".join(os.environ.keys())
IS_COLAB_ENVIRONMENT = "\nCOLAB_" in keynames
IS_KAGGLE_ENVIRONMENT = "\nKAGGLE_" in keynames

Broken llama.cpp directory warning from `save.py:970-980`:

if os.path.exists("llama.cpp"):
    print(
        "**[WARNING]** You have a llama.cpp directory which is broken.\n"
        "Unsloth will DELETE the broken directory and install a new one.\n"
        "Press CTRL + C / cancel this if this is wrong."
    )

Common Errors

Error Message	Cause	Solution
`cmake: command not found`	cmake not installed	`sudo apt-get install cmake`
`make: command not found`	make not installed	`sudo apt-get install make`
`[WARNING] You have a llama.cpp directory which is broken`	Previous llama.cpp build was corrupted	Let Unsloth auto-delete and reinstall, or manually remove `llama.cpp/` directory
`llama-quantize not found`	Build did not produce required targets	Re-run GGUF save to trigger rebuild; check cmake/gcc versions

Compatibility Notes

Colab/Kaggle: Unsloth auto-detects Colab and Kaggle environments and adjusts temporary file paths (Kaggle uses `/tmp` for intermediate files).
Build delegation: All llama.cpp build logic is in `unsloth_zoo.llama_cpp`, not in the main Unsloth repo. The `install_llama_cpp()` function handles the full git clone + cmake + make pipeline.
Auto-compilation: llama.cpp is automatically compiled on first GGUF save if not already present. Subsequent saves reuse the compiled binaries.

Related Pages

Implementation:Unslothai_Unsloth_Save_Pretrained_GGUF

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment