Implementation:Ggml org Ggml Ggml quantize chunk
ggml_quantize_chunk
ggml_quantize_chunk is the core C function that performs block-wise quantization of floating-point weight data into a target GGML quantization format. It dispatches to type-specific quantization routines based on the requested type enum.
API Signature
size_t ggml_quantize_chunk(
enum ggml_type type,
const float * src,
void * dst,
int64_t start,
int64_t nrows,
int64_t n_per_row,
const float * imatrix
);
Source: src/ggml.c:L7537-7609
Repository: https://github.com/ggml-org/ggml
Parameters
| Parameter | Type | Description |
|---|---|---|
type |
enum ggml_type |
Target quantization type (e.g., GGML_TYPE_Q4_0, GGML_TYPE_Q8_0)
|
src |
const float * |
Pointer to float32 source data |
dst |
void * |
Output buffer for quantized data |
start |
int64_t |
Starting row index |
nrows |
int64_t |
Number of rows to quantize |
n_per_row |
int64_t |
Number of elements per row |
imatrix |
const float * |
Optional importance matrix; pass NULL for uniform quantization
|
Return Value
Returns size_t -- the number of bytes written to dst.
Dispatch Mechanism
The function dispatches to type-specific quantization routines defined in src/ggml-quants.c, such as:
quantize_row_q4_0quantize_row_q4_1quantize_row_q5_0quantize_row_q5_1quantize_row_q8_0- k-quant and IQ-type variants
Higher-Level Wrapper
A higher-level C++ wrapper is also provided:
bool ggml_common_quantize_0(
std::ifstream & finp,
std::ofstream & fout,
const ggml_ftype ftype,
const std::vector<std::string> & to_quant,
const std::vector<std::string> & to_skip
);
Source: examples/common-ggml.cpp:L41-240
This wrapper reads a GGML model file, iterates over tensors, selectively quantizes 2D weight tensors (skipping those in to_skip, targeting those in to_quant), and writes the quantized model to the output stream.
Dependencies
ggml.hggml-quants.h
Import
#include "ggml.h"
Language
C