Implementation:Ggml org Llama cpp IMatrixCollector
| Field | Value |
|---|---|
| Implementation Name | IMatrixCollector |
| Doc Type | Wrapper Doc |
| Topic | Model Quantization |
| Workflow | Model_Quantization |
| Category | Calibration Data |
| Repository | Ggml_org_Llama_cpp |
Overview
Description
The IMatrixCollector class implements importance matrix data collection by intercepting tensor operations during model inference. Its primary method, collect_imatrix(), serves as a callback that the ggml backend invokes for each matrix multiplication operation. The collector accumulates squared activation values per weight tensor across all forward passes, building a statistical profile of weight importance that is later used to guide quantization precision allocation.
The class also manages persistence through save_imatrix() and load_imatrix() methods, storing importance data in GGUF format with metadata keys for dataset provenance, chunk counts, and chunk sizes.
Usage
The IMatrixCollector is instantiated by the llama-imatrix tool, configured with runtime parameters, and registered as an evaluation callback. During inference over calibration text, it automatically collects importance statistics for all relevant weight tensors.
Code Reference
Source Location
- Class definition:
tools/imatrix/imatrix.cpp(lines 60-78) - Metadata constants:
tools/imatrix/imatrix.cpp(lines 36-38) - collect_imatrix method:
tools/imatrix/imatrix.cpp(lines 219-340+)
Signature
class IMatrixCollector {
public:
IMatrixCollector() = default;
void set_params(common_params params) { m_params = std::move(params); }
bool collect_imatrix(struct ggml_tensor * t, bool ask, void * user_data);
void save_imatrix_legacy(int32_t ncall = -1) const;
void save_imatrix(int32_t n_chunk = -1) const;
bool load_imatrix_legacy(const char * fname);
bool load_imatrix(const char * file_name);
const std::unordered_map<std::string, Stats> & get_mstats() const { return m_stats; }
private:
std::unordered_map<std::string, Stats> m_stats;
common_params m_params;
std::mutex m_mutex;
std::vector<std::string> m_datasets;
int32_t m_last_chunk = 0;
std::vector<char> m_src1_data;
std::vector<char> m_ids; // the expert ids from ggml_mul_mat_id
};
Supporting data structures:
struct Stats {
std::vector<float> values;
std::vector<int64_t> counts;
};
// Metadata keys for imatrix GGUF files
static const char * const LLM_KV_IMATRIX_DATASETS = "imatrix.datasets";
static const char * const LLM_KV_IMATRIX_CHUNK_COUNT = "imatrix.chunk_count";
static const char * const LLM_KV_IMATRIX_CHUNK_SIZE = "imatrix.chunk_size";
Import
#include "common.h"
#include "llama.h"
#include "gguf.h"
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input (t) | struct ggml_tensor * |
The tensor operation being evaluated; t->src[0] contains the weight tensor, t->src[1] contains the activation tensor
|
| Input (ask) | bool |
When true, the scheduler asks if the collector is interested in this tensor's data; when false, actual data collection occurs |
| Input (user_data) | void * |
User-provided context pointer (unused in current implementation) |
| Output | bool |
When ask=true: returns true if the collector wants data for this tensor. When ask=false: returns true on success.
|
| Side Effect | m_stats map |
Accumulated squared activation values and counts per tensor name |
Tensor filtering logic (when ask=true):
- Always collects
GGML_OP_MUL_MAT_IDoperations (MoE expert routing) - For
GGML_OP_MUL_MAT: requires batch size >= 16 tokens, F32 activations, and tensor name starting with "blk." or being "output.weight" (if process_output is enabled) - Rejects all other operation types
Usage Examples
Example 1: Command-line imatrix generation
# Generate importance matrix from calibration text
./llama-imatrix \
-m model-f16.gguf \
-f calibration-text.txt \
-o imatrix.gguf \
--output-format gguf \
--chunk 512
Example 2: Using imatrix with quantization
# First generate the importance matrix
./llama-imatrix -m model-f16.gguf -f wiki.train.raw -o imatrix.gguf
# Then quantize using the importance matrix
./llama-quantize --imatrix imatrix.gguf model-f16.gguf model-iq4_xs.gguf IQ4_XS
Example 3: Programmatic usage of collect_imatrix callback
IMatrixCollector collector;
collector.set_params(params);
// Register as eval callback
auto callback = [](struct ggml_tensor * t, bool ask, void * user_data) -> bool {
return static_cast<IMatrixCollector *>(user_data)->collect_imatrix(t, ask, user_data);
};
// After inference completes, save the collected data
collector.save_imatrix(n_chunks_processed);
Example 4: Incremental collection with previously saved data
# Load a previously computed imatrix and continue collecting
./llama-imatrix \
-m model-f16.gguf \
-f additional-text.txt \
--in-file imatrix-prev.gguf \
-o imatrix-combined.gguf