Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Unslothai Unsloth Llama Cpp

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Model_Export, Quantization
Last Updated 2026-02-07 09:00 GMT

Overview

Build environment for llama.cpp compilation (cmake, make, gcc) required for GGUF model export and quantization.

Description

This environment provides the C/C++ build toolchain needed to compile llama.cpp, which Unsloth uses for GGUF format conversion and quantization. The build process is delegated to the `unsloth_zoo.llama_cpp` module which handles git cloning, compilation, and artifact validation. Three build targets are required: `llama-quantize`, `llama-cli`, and `llama-server`. The environment auto-detects Colab and Kaggle environments for path management.

Usage

Use this environment when calling `model.save_pretrained_gguf()` or `model.push_to_hub_gguf()` for GGUF format export. This is only needed for GGUF conversion; SafeTensors saving and Hub upload do not require llama.cpp.

System Requirements

Category Requirement Notes
OS Linux cmake/make compilation target
Hardware CPU (GPU optional for CUDA-accelerated quantization) Compilation is CPU-bound
Disk 5GB+ For llama.cpp source, build artifacts, and intermediate GGUF files
RAM 8GB+ Model conversion and quantization are memory-intensive

Dependencies

System Packages

  • `cmake` (for llama.cpp build system)
  • `make` (build automation)
  • `gcc` / `g++` (C/C++ compiler with C++17 support)
  • `git` (for cloning llama.cpp repository)

Python Packages

  • `unsloth_zoo` >= 2026.2.1 (contains `install_llama_cpp`, `check_llama_cpp`, `convert_to_gguf`, `quantize_gguf`)
  • `sentencepiece` (for tokenizer conversion)
  • `psutil` (for memory management during conversion)
  • All packages from Python_Transformers environment

Credentials

  • `HF_TOKEN`: HuggingFace API token (Write access for `push_to_hub_gguf`).

Quick Install

# Install system build tools (Ubuntu/Debian)
sudo apt-get install cmake make gcc g++ git

# Install Python dependencies
pip install unsloth "unsloth_zoo>=2026.2.1" sentencepiece psutil

# llama.cpp is auto-compiled on first GGUF save

Code Evidence

llama.cpp imports from `save.py:18-25`:

from unsloth_zoo.llama_cpp import (
    convert_to_gguf,
    quantize_gguf,
    use_local_gguf,
    install_llama_cpp,
    check_llama_cpp,
    _download_convert_hf_to_gguf,
)

Build targets from `save.py:70-74`:

LLAMA_CPP_TARGETS = [
    "llama-quantize",
    "llama-cli",
    "llama-server",
]

Environment detection from `save.py:77-81`:

keynames = "\n" + "\n".join(os.environ.keys())
IS_COLAB_ENVIRONMENT = "\nCOLAB_" in keynames
IS_KAGGLE_ENVIRONMENT = "\nKAGGLE_" in keynames

Broken llama.cpp directory warning from `save.py:970-980`:

if os.path.exists("llama.cpp"):
    print(
        "**[WARNING]** You have a llama.cpp directory which is broken.\n"
        "Unsloth will DELETE the broken directory and install a new one.\n"
        "Press CTRL + C / cancel this if this is wrong."
    )

Common Errors

Error Message Cause Solution
`cmake: command not found` cmake not installed `sudo apt-get install cmake`
`make: command not found` make not installed `sudo apt-get install make`
`[WARNING] You have a llama.cpp directory which is broken` Previous llama.cpp build was corrupted Let Unsloth auto-delete and reinstall, or manually remove `llama.cpp/` directory
`llama-quantize not found` Build did not produce required targets Re-run GGUF save to trigger rebuild; check cmake/gcc versions

Compatibility Notes

  • Colab/Kaggle: Unsloth auto-detects Colab and Kaggle environments and adjusts temporary file paths (Kaggle uses `/tmp` for intermediate files).
  • Build delegation: All llama.cpp build logic is in `unsloth_zoo.llama_cpp`, not in the main Unsloth repo. The `install_llama_cpp()` function handles the full git clone + cmake + make pipeline.
  • Auto-compilation: llama.cpp is automatically compiled on first GGUF save if not already present. Subsequent saves reuse the compiled binaries.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment