Implementation:Ggml org Llama cpp Quantize CMake Build

Field	Value
Implementation Name	Quantize CMake Build
Doc Type	External Tool Doc
Topic	Model Quantization
Workflow	Model_Quantization
Category	Build System
Repository	Ggml_org_Llama_cpp

Overview

Description

The llama-quantize build target is defined in the CMake build system and produces the llama-quantize executable binary. This target compiles quantize.cpp and links it against the common library (shared utilities), the llama library (core inference engine), and the platform threading library. The resulting binary is the primary command-line tool for converting full-precision GGUF model files into quantized formats.

Usage

To build the llama-quantize tool, first configure the project with CMake, then invoke the build targeting only the quantize binary:

# Configure the project (from repository root)
cmake -B build

# Build only the llama-quantize target
cmake --build build --target llama-quantize

The resulting binary is located at build/bin/llama-quantize (or build/tools/quantize/llama-quantize depending on the CMake output layout).

Code Reference

Source Location

tools/quantize/CMakeLists.txt (lines 1-9)

Signature

set(TARGET llama-quantize)
add_executable(${TARGET} quantize.cpp)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_include_directories(${TARGET} PRIVATE ../../common)
target_compile_features(${TARGET} PRIVATE cxx_std_17)

if(LLAMA_TOOLS_INSTALL)
    install(TARGETS ${TARGET} RUNTIME)
endif()

Import

This is a CMake build definition. No import statement is required. The target is invoked via the CMake build command:

cmake --build . --target llama-quantize

I/O Contract

Direction	Type	Description
Input	`quantize.cpp`	Main C++ source file containing the quantize tool entry point
Input	`common` library	Shared utilities (argument parsing, logging, tokenization)
Input	`llama` library	Core llama.cpp library providing `llama_model_quantize()`
Input	`CMAKE_THREAD_LIBS_INIT`	Platform threading library (pthread on POSIX, Win32 threads on Windows)
Output	`llama-quantize` executable	Compiled binary that performs model quantization

Usage Examples

Example 1: Full build and quantize workflow

# Clone and build
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target llama-quantize

# Run the quantize tool
./build/bin/llama-quantize model-f16.gguf model-q4_0.gguf Q4_0

Example 2: Build with specific compiler and parallel jobs

cmake -B build -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++
cmake --build build --target llama-quantize -j$(nproc)

Example 3: Install the tool system-wide

cmake -B build -DLLAMA_TOOLS_INSTALL=ON -DCMAKE_INSTALL_PREFIX=/usr/local
cmake --build build --target llama-quantize
cmake --install build

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment