Implementation:Ggml org Ggml Python utils

**Implementation Metadata**
File Name	`examples/python/ggml/utils.py`
Repository	ggml-org/ggml
Lines	182
Language	Python
Domain Tags	Python_Bindings, Tensor_Interop, Quantization
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

examples/python/ggml/utils.py is a utility module providing high-level helpers for interop between GGML tensors and numpy arrays, including automatic (de/re)quantization. This is the key usability layer for the Python bindings, making quantized tensor manipulation as simple as working with numpy arrays while handling GGML's quantization formats transparently.

Description

The module provides three main functions:

init(mem_size) -- Creates a GGML context with automatic GC-based freeing via ffi.gc(lib.ggml_init(params), lib.ggml_free)
copy(from_tensor, to_tensor) -- Transparently copies between numpy arrays and GGML tensors (including quantized ones) by detecting types and using appropriate dequantize/requantize paths. Validates shape consistency and supports an allow_requantize flag
numpy(tensor) -- Returns a numpy view over GGML tensor data (zero-copy for F32/I32) or a dequantized copy for quantized types. Supports allow_copy parameter

The TensorLike type alias (Union[ffi.CData, np.ndarray]) enables unified handling of both numpy and GGML tensors. Internal helpers manage type detection, shape validation, quantization block size alignment, and data pointer access.

Usage

from ggml.utils import init, copy, numpy

# Create a GGML context
ctx = init(16 * 1024 * 1024)  # 16 MB

# Convert a quantized GGML tensor to numpy
arr = numpy(quantized_tensor, allow_copy=True)

# Copy from numpy to a GGML tensor
copy(numpy_array, ggml_tensor)

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`examples/python/ggml/utils.py`	182

Key Signatures

def init(mem_size: int, mem_buffer: ffi.CData = ffi.NULL, no_alloc: bool = False) -> ffi.CData:
    """Initialize a ggml context with automatic cleanup."""

TensorLike = Union[ffi.CData, np.ndarray]

def copy(from_tensor: TensorLike, to_tensor: TensorLike, allow_requantize: bool = True):
    """Copy between ggml and numpy tensors with transparent (de/re)quantization."""

def numpy(tensor: ffi.CData, allow_copy: Union[bool, np.ndarray] = False,
          allow_requantize=False) -> np.ndarray:
    """Convert a ggml tensor to a numpy array (view for unquantized, copy for quantized)."""

I/O Contract

Inputs

GGML tensors -- ffi.CData pointers to GGML tensor structs (possibly quantized)
numpy arrays -- Standard numpy ndarrays (float32, int32, etc.)
Configuration flags -- allow_copy, allow_requantize

Outputs

numpy arrays -- Views or copies of tensor data as numpy arrays
Errors -- ValueError for quantized tensors without allow_copy, AssertionError for shape mismatches

Usage Examples

Working with quantized tensors:

from ggml.utils import numpy, copy

# Get numpy view of an F32 tensor (zero-copy)
arr = numpy(f32_tensor)  # Returns a view, changes affect the tensor

# Dequantize a Q4_0 tensor to float32
arr = numpy(q4_0_tensor, allow_copy=True)  # Returns a copy

# Copy dequantized data into a pre-allocated array
output = np.empty(shape, dtype=np.float32)
arr = numpy(q4_0_tensor, allow_copy=output)

# Copy between different quantization types
copy(q4_0_tensor, q8_0_tensor, allow_requantize=True)

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_Python_Tensor_Interop

Related Implementations

Implementation:Ggml_org_Ggml_Ggml_init -- Context initialization
Implementation:Ggml_org_Ggml_Quants_api -- Quantization functions used internally

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment