Implementation:Ggml org Llama cpp Quantize CMake Build
| Field | Value |
|---|---|
| Implementation Name | Quantize CMake Build |
| Doc Type | External Tool Doc |
| Topic | Model Quantization |
| Workflow | Model_Quantization |
| Category | Build System |
| Repository | Ggml_org_Llama_cpp |
Overview
Description
The llama-quantize build target is defined in the CMake build system and produces the llama-quantize executable binary. This target compiles quantize.cpp and links it against the common library (shared utilities), the llama library (core inference engine), and the platform threading library. The resulting binary is the primary command-line tool for converting full-precision GGUF model files into quantized formats.
Usage
To build the llama-quantize tool, first configure the project with CMake, then invoke the build targeting only the quantize binary:
# Configure the project (from repository root)
cmake -B build
# Build only the llama-quantize target
cmake --build build --target llama-quantize
The resulting binary is located at build/bin/llama-quantize (or build/tools/quantize/llama-quantize depending on the CMake output layout).
Code Reference
Source Location
tools/quantize/CMakeLists.txt (lines 1-9)
Signature
set(TARGET llama-quantize)
add_executable(${TARGET} quantize.cpp)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_include_directories(${TARGET} PRIVATE ../../common)
target_compile_features(${TARGET} PRIVATE cxx_std_17)
if(LLAMA_TOOLS_INSTALL)
install(TARGETS ${TARGET} RUNTIME)
endif()
Import
This is a CMake build definition. No import statement is required. The target is invoked via the CMake build command:
cmake --build . --target llama-quantize
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | quantize.cpp |
Main C++ source file containing the quantize tool entry point |
| Input | common library |
Shared utilities (argument parsing, logging, tokenization) |
| Input | llama library |
Core llama.cpp library providing llama_model_quantize()
|
| Input | CMAKE_THREAD_LIBS_INIT |
Platform threading library (pthread on POSIX, Win32 threads on Windows) |
| Output | llama-quantize executable |
Compiled binary that performs model quantization |
Usage Examples
Example 1: Full build and quantize workflow
# Clone and build
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target llama-quantize
# Run the quantize tool
./build/bin/llama-quantize model-f16.gguf model-q4_0.gguf Q4_0
Example 2: Build with specific compiler and parallel jobs
cmake -B build -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++
cmake --build build --target llama-quantize -j$(nproc)
Example 3: Install the tool system-wide
cmake -B build -DLLAMA_TOOLS_INSTALL=ON -DCMAKE_INSTALL_PREFIX=/usr/local
cmake --build build --target llama-quantize
cmake --install build