Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Ggml Ggml quantize chunk

From Leeroopedia


Template:KapsoMeta

ggml_quantize_chunk

ggml_quantize_chunk is the core C function that performs block-wise quantization of floating-point weight data into a target GGML quantization format. It dispatches to type-specific quantization routines based on the requested type enum.

API Signature

size_t ggml_quantize_chunk(
    enum ggml_type   type,
    const float    * src,
    void           * dst,
    int64_t          start,
    int64_t          nrows,
    int64_t          n_per_row,
    const float    * imatrix
);

Source: src/ggml.c:L7537-7609

Repository: https://github.com/ggml-org/ggml

Parameters

Parameter Type Description
type enum ggml_type Target quantization type (e.g., GGML_TYPE_Q4_0, GGML_TYPE_Q8_0)
src const float * Pointer to float32 source data
dst void * Output buffer for quantized data
start int64_t Starting row index
nrows int64_t Number of rows to quantize
n_per_row int64_t Number of elements per row
imatrix const float * Optional importance matrix; pass NULL for uniform quantization

Return Value

Returns size_t -- the number of bytes written to dst.

Dispatch Mechanism

The function dispatches to type-specific quantization routines defined in src/ggml-quants.c, such as:

  • quantize_row_q4_0
  • quantize_row_q4_1
  • quantize_row_q5_0
  • quantize_row_q5_1
  • quantize_row_q8_0
  • k-quant and IQ-type variants

Higher-Level Wrapper

A higher-level C++ wrapper is also provided:

bool ggml_common_quantize_0(
    std::ifstream              & finp,
    std::ofstream              & fout,
    const ggml_ftype             ftype,
    const std::vector<std::string> & to_quant,
    const std::vector<std::string> & to_skip
);

Source: examples/common-ggml.cpp:L41-240

This wrapper reads a GGML model file, iterates over tensors, selectively quantizes 2D weight tensors (skipping those in to_skip, targeting those in to_quant), and writes the quantized model to the output stream.

Dependencies

  • ggml.h
  • ggml-quants.h

Import

#include "ggml.h"

Language

C

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment