Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Cpu weight repack

From Leeroopedia
Revision as of 15:01, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Ggml_Cpu_weight_repack.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Metadata

Field Value
Page Type Implementation (Weight Repacking)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, CPU_Backend, Quantization
Last Updated 2025-05-15 12:00 GMT

Overview

Implements weight repacking for optimized quantized GEMM/GEMV kernels, converting standard quantization block layouts into interleaved formats for better SIMD utilization.

Description

repack.cpp converts standard quantized weight layouts into interleaved block formats that enable faster SIMD processing. It provides:

  1. Interleaved quantization: Generic implementations of ggml_quantize_mat_q8_0_4x4, ggml_quantize_mat_q8_0_4x8, and corresponding q8_K variants. These pack 4 adjacent quantization blocks into interleaved layouts (block_q4_0x4, block_q4_0x8, block_q8_0x4) where deltas are grouped first and quants are interleaved in fixed-size chunks.
  2. Optimized GEMV/GEMM kernels: Implements ggml_gemv_q4_0_4x4_q8_0, ggml_gemm_q4_0_4x4_q8_0, and variants for q4_K, q5_K, q6_K, iq4_nl, q8_0, q2_K formats. These operate on the repacked data for better cache utilization and vectorization.
  3. Backend buffer type: Registers as an extra_buffer_type with custom tensor_traits that intercept GGML_OP_MUL_MAT operations and redirect them to the repacked kernel implementations.
  4. Architecture fallback: Functions use the _generic suffix and are aliased via arch-fallback.h. Architecture-specific optimized versions in arch/arm/repack.cpp, arch/x86/repack.cpp, etc., override these when available.

Usage

Weight repacking is activated automatically when the CPU backend's extra buffer types include the repack buffer type (enabled by GGML_USE_CPU_REPACK). Tensors allocated through this buffer type are transparently repacked.

Code Reference

Source Location

GGML repo, file: src/ggml-cpu/repack.cpp (3247 lines).

Signature

// Interleaved quantization packing
void ggml_quantize_mat_q8_0_4x4_generic(const float * GGML_RESTRICT x,
    void * GGML_RESTRICT vy, int64_t k);
void ggml_quantize_mat_q8_0_4x8_generic(const float * GGML_RESTRICT x,
    void * GGML_RESTRICT vy, int64_t k);
void ggml_quantize_mat_q8_K_4x4_generic(const float * GGML_RESTRICT x,
    void * GGML_RESTRICT vy, int64_t k);

// Optimized GEMV/GEMM on repacked data
void ggml_gemv_q4_0_4x4_q8_0(int n, float * GGML_RESTRICT s,
    size_t bs, const void * GGML_RESTRICT vx, const void * GGML_RESTRICT vy, int nr, int nc);
void ggml_gemm_q4_0_4x4_q8_0(int n, float * GGML_RESTRICT s,
    size_t bs, const void * GGML_RESTRICT vx, const void * GGML_RESTRICT vy, int nr, int nc);

// Backend buffer type registration
ggml_backend_buffer_type_t ggml_backend_cpu_repack_buffer_type(void);

Import

#include "repack.h"

I/O Contract

Inputs

Parameter Type Required Description
x const float * Yes Source float data (for quantization packing) or quantized weights (for GEMM).
vy void * Yes Destination buffer for interleaved quantized blocks.
k int64_t Yes Number of elements per row (must be a multiple of the block size).
n int Yes (GEMM) Inner dimension of the matrix multiplication.

Outputs

Output Type Description
vy void * Interleaved quantized block data.
s float * Matrix multiplication result buffer.

Usage Examples

Repacking Q8_0 Weights for 4x4 Interleaving

#include "repack.h"

// Source: 4 rows of k floats
float weights[4 * k];
block_q8_0x4 packed[k / QK8_0];

// Pack into interleaved 4x4 format
ggml_quantize_mat_q8_0_4x4_generic(weights, packed, k);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment