Environment:Unslothai Unsloth Llama Cpp
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Model_Export, Quantization |
| Last Updated | 2026-02-07 09:00 GMT |
Overview
Build environment for llama.cpp compilation (cmake, make, gcc) required for GGUF model export and quantization.
Description
This environment provides the C/C++ build toolchain needed to compile llama.cpp, which Unsloth uses for GGUF format conversion and quantization. The build process is delegated to the `unsloth_zoo.llama_cpp` module which handles git cloning, compilation, and artifact validation. Three build targets are required: `llama-quantize`, `llama-cli`, and `llama-server`. The environment auto-detects Colab and Kaggle environments for path management.
Usage
Use this environment when calling `model.save_pretrained_gguf()` or `model.push_to_hub_gguf()` for GGUF format export. This is only needed for GGUF conversion; SafeTensors saving and Hub upload do not require llama.cpp.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | cmake/make compilation target |
| Hardware | CPU (GPU optional for CUDA-accelerated quantization) | Compilation is CPU-bound |
| Disk | 5GB+ | For llama.cpp source, build artifacts, and intermediate GGUF files |
| RAM | 8GB+ | Model conversion and quantization are memory-intensive |
Dependencies
System Packages
- `cmake` (for llama.cpp build system)
- `make` (build automation)
- `gcc` / `g++` (C/C++ compiler with C++17 support)
- `git` (for cloning llama.cpp repository)
Python Packages
- `unsloth_zoo` >= 2026.2.1 (contains `install_llama_cpp`, `check_llama_cpp`, `convert_to_gguf`, `quantize_gguf`)
- `sentencepiece` (for tokenizer conversion)
- `psutil` (for memory management during conversion)
- All packages from Python_Transformers environment
Credentials
- `HF_TOKEN`: HuggingFace API token (Write access for `push_to_hub_gguf`).
Quick Install
# Install system build tools (Ubuntu/Debian)
sudo apt-get install cmake make gcc g++ git
# Install Python dependencies
pip install unsloth "unsloth_zoo>=2026.2.1" sentencepiece psutil
# llama.cpp is auto-compiled on first GGUF save
Code Evidence
llama.cpp imports from `save.py:18-25`:
from unsloth_zoo.llama_cpp import (
convert_to_gguf,
quantize_gguf,
use_local_gguf,
install_llama_cpp,
check_llama_cpp,
_download_convert_hf_to_gguf,
)
Build targets from `save.py:70-74`:
LLAMA_CPP_TARGETS = [
"llama-quantize",
"llama-cli",
"llama-server",
]
Environment detection from `save.py:77-81`:
keynames = "\n" + "\n".join(os.environ.keys())
IS_COLAB_ENVIRONMENT = "\nCOLAB_" in keynames
IS_KAGGLE_ENVIRONMENT = "\nKAGGLE_" in keynames
Broken llama.cpp directory warning from `save.py:970-980`:
if os.path.exists("llama.cpp"):
print(
"**[WARNING]** You have a llama.cpp directory which is broken.\n"
"Unsloth will DELETE the broken directory and install a new one.\n"
"Press CTRL + C / cancel this if this is wrong."
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `cmake: command not found` | cmake not installed | `sudo apt-get install cmake` |
| `make: command not found` | make not installed | `sudo apt-get install make` |
| `[WARNING] You have a llama.cpp directory which is broken` | Previous llama.cpp build was corrupted | Let Unsloth auto-delete and reinstall, or manually remove `llama.cpp/` directory |
| `llama-quantize not found` | Build did not produce required targets | Re-run GGUF save to trigger rebuild; check cmake/gcc versions |
Compatibility Notes
- Colab/Kaggle: Unsloth auto-detects Colab and Kaggle environments and adjusts temporary file paths (Kaggle uses `/tmp` for intermediate files).
- Build delegation: All llama.cpp build logic is in `unsloth_zoo.llama_cpp`, not in the main Unsloth repo. The `install_llama_cpp()` function handles the full git clone + cmake + make pipeline.
- Auto-compilation: llama.cpp is automatically compiled on first GGUF save if not already present. Subsequent saves reuse the compiled binaries.