Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Quantize CMake Build

From Leeroopedia
Field Value
Implementation Name Quantize CMake Build
Doc Type External Tool Doc
Topic Model Quantization
Workflow Model_Quantization
Category Build System
Repository Ggml_org_Llama_cpp

Overview

Description

The llama-quantize build target is defined in the CMake build system and produces the llama-quantize executable binary. This target compiles quantize.cpp and links it against the common library (shared utilities), the llama library (core inference engine), and the platform threading library. The resulting binary is the primary command-line tool for converting full-precision GGUF model files into quantized formats.

Usage

To build the llama-quantize tool, first configure the project with CMake, then invoke the build targeting only the quantize binary:

# Configure the project (from repository root)
cmake -B build

# Build only the llama-quantize target
cmake --build build --target llama-quantize

The resulting binary is located at build/bin/llama-quantize (or build/tools/quantize/llama-quantize depending on the CMake output layout).

Code Reference

Source Location

tools/quantize/CMakeLists.txt (lines 1-9)

Signature

set(TARGET llama-quantize)
add_executable(${TARGET} quantize.cpp)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_include_directories(${TARGET} PRIVATE ../../common)
target_compile_features(${TARGET} PRIVATE cxx_std_17)

if(LLAMA_TOOLS_INSTALL)
    install(TARGETS ${TARGET} RUNTIME)
endif()

Import

This is a CMake build definition. No import statement is required. The target is invoked via the CMake build command:

cmake --build . --target llama-quantize

I/O Contract

Direction Type Description
Input quantize.cpp Main C++ source file containing the quantize tool entry point
Input common library Shared utilities (argument parsing, logging, tokenization)
Input llama library Core llama.cpp library providing llama_model_quantize()
Input CMAKE_THREAD_LIBS_INIT Platform threading library (pthread on POSIX, Win32 threads on Windows)
Output llama-quantize executable Compiled binary that performs model quantization

Usage Examples

Example 1: Full build and quantize workflow

# Clone and build
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target llama-quantize

# Run the quantize tool
./build/bin/llama-quantize model-f16.gguf model-q4_0.gguf Q4_0

Example 2: Build with specific compiler and parallel jobs

cmake -B build -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++
cmake --build build --target llama-quantize -j$(nproc)

Example 3: Install the tool system-wide

cmake -B build -DLLAMA_TOOLS_INSTALL=ON -DCMAKE_INSTALL_PREFIX=/usr/local
cmake --build build --target llama-quantize
cmake --install build

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment