Implementation:Ggml org Ggml Ggml quantize chunk

ggml_quantize_chunk

ggml_quantize_chunk is the core C function that performs block-wise quantization of floating-point weight data into a target GGML quantization format. It dispatches to type-specific quantization routines based on the requested type enum.

API Signature

size_t ggml_quantize_chunk(
    enum ggml_type   type,
    const float    * src,
    void           * dst,
    int64_t          start,
    int64_t          nrows,
    int64_t          n_per_row,
    const float    * imatrix
);

Source: src/ggml.c:L7537-7609

Repository: https://github.com/ggml-org/ggml

Parameters

Parameter	Type	Description
`type`	`enum ggml_type`	Target quantization type (e.g., `GGML_TYPE_Q4_0`, `GGML_TYPE_Q8_0`)
`src`	`const float *`	Pointer to float32 source data
`dst`	`void *`	Output buffer for quantized data
`start`	`int64_t`	Starting row index
`nrows`	`int64_t`	Number of rows to quantize
`n_per_row`	`int64_t`	Number of elements per row
`imatrix`	`const float *`	Optional importance matrix; pass `NULL` for uniform quantization

Return Value

Returns size_t -- the number of bytes written to dst.

Dispatch Mechanism

The function dispatches to type-specific quantization routines defined in src/ggml-quants.c, such as:

quantize_row_q4_0
quantize_row_q4_1
quantize_row_q5_0
quantize_row_q5_1
quantize_row_q8_0
k-quant and IQ-type variants

Higher-Level Wrapper

A higher-level C++ wrapper is also provided:

bool ggml_common_quantize_0(
    std::ifstream              & finp,
    std::ofstream              & fout,
    const ggml_ftype             ftype,
    const std::vector<std::string> & to_quant,
    const std::vector<std::string> & to_skip
);

Source: examples/common-ggml.cpp:L41-240

This wrapper reads a GGML model file, iterates over tensors, selectively quantizes 2D weight tensors (skipping those in to_skip, targeting those in to_quant), and writes the quantized model to the output stream.

Dependencies

ggml.h
ggml-quants.h

Import

#include "ggml.h"

Language

C

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment