Implementation:Ggml org Ggml Python utils
| File Name | examples/python/ggml/utils.py
|
| Repository | ggml-org/ggml |
| Lines | 182 |
| Language | Python |
| Domain Tags | Python_Bindings, Tensor_Interop, Quantization |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
examples/python/ggml/utils.py is a utility module providing high-level helpers for interop between GGML tensors and numpy arrays, including automatic (de/re)quantization. This is the key usability layer for the Python bindings, making quantized tensor manipulation as simple as working with numpy arrays while handling GGML's quantization formats transparently.
Description
The module provides three main functions:
init(mem_size)-- Creates a GGML context with automatic GC-based freeing viaffi.gc(lib.ggml_init(params), lib.ggml_free)copy(from_tensor, to_tensor)-- Transparently copies between numpy arrays and GGML tensors (including quantized ones) by detecting types and using appropriate dequantize/requantize paths. Validates shape consistency and supports anallow_requantizeflagnumpy(tensor)-- Returns a numpy view over GGML tensor data (zero-copy for F32/I32) or a dequantized copy for quantized types. Supportsallow_copyparameter
The TensorLike type alias (Union[ffi.CData, np.ndarray]) enables unified handling of both numpy and GGML tensors. Internal helpers manage type detection, shape validation, quantization block size alignment, and data pointer access.
Usage
from ggml.utils import init, copy, numpy # Create a GGML context ctx = init(16 * 1024 * 1024) # 16 MB # Convert a quantized GGML tensor to numpy arr = numpy(quantized_tensor, allow_copy=True) # Copy from numpy to a GGML tensor copy(numpy_array, ggml_tensor)
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | examples/python/ggml/utils.py |
182 |
Key Signatures
def init(mem_size: int, mem_buffer: ffi.CData = ffi.NULL, no_alloc: bool = False) -> ffi.CData:
"""Initialize a ggml context with automatic cleanup."""
TensorLike = Union[ffi.CData, np.ndarray]
def copy(from_tensor: TensorLike, to_tensor: TensorLike, allow_requantize: bool = True):
"""Copy between ggml and numpy tensors with transparent (de/re)quantization."""
def numpy(tensor: ffi.CData, allow_copy: Union[bool, np.ndarray] = False,
allow_requantize=False) -> np.ndarray:
"""Convert a ggml tensor to a numpy array (view for unquantized, copy for quantized)."""
I/O Contract
Inputs
- GGML tensors --
ffi.CDatapointers to GGML tensor structs (possibly quantized) - numpy arrays -- Standard numpy ndarrays (float32, int32, etc.)
- Configuration flags --
allow_copy,allow_requantize
Outputs
- numpy arrays -- Views or copies of tensor data as numpy arrays
- Errors --
ValueErrorfor quantized tensors withoutallow_copy,AssertionErrorfor shape mismatches
Usage Examples
Working with quantized tensors:
from ggml.utils import numpy, copy # Get numpy view of an F32 tensor (zero-copy) arr = numpy(f32_tensor) # Returns a view, changes affect the tensor # Dequantize a Q4_0 tensor to float32 arr = numpy(q4_0_tensor, allow_copy=True) # Returns a copy # Copy dequantized data into a pre-allocated array output = np.empty(shape, dtype=np.float32) arr = numpy(q4_0_tensor, allow_copy=output) # Copy between different quantization types copy(q4_0_tensor, q8_0_tensor, allow_requantize=True)
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Ggml_init -- Context initialization
- Implementation:Ggml_org_Ggml_Quants_api -- Quantization functions used internally