Implementation:Ggml org Ggml Ggml init

Metadata

Field	Value
Page Type	Implementation (API Doc)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing
Last Updated	2025-05-15 12:00 GMT

Overview

Concrete tool for initializing a tensor metadata memory context (arena) provided by the GGML library.

Description

ggml_init is the primary entry point for creating a ggml_context, the arena-based memory pool that holds tensor metadata in GGML. It allocates (or accepts) a contiguous memory block and returns an opaque context pointer through which all subsequent tensor creation operations are performed.

The initialization process works as follows:

Parameter validation: The function receives a ggml_init_params struct specifying the desired memory pool size, an optional external buffer, and allocation behavior flags.
Memory acquisition: If mem_buffer is NULL, the function internally allocates mem_size bytes via the system allocator. If mem_buffer is provided, the function uses that externally managed buffer as the arena, avoiding any internal allocation.
Context initialization: Internal bookkeeping structures are set up, including the arena offset pointer (initially at the beginning of the pool), the total capacity, and the no_alloc flag.
Return: A pointer to the initialized ggml_context is returned. All subsequent calls to ggml_new_tensor and related functions use this context to bump-allocate tensor metadata from the arena.

When the context is no longer needed, it is destroyed with ggml_free(), which releases the entire arena in a single operation -- freeing all tensor metadata it contains regardless of the number of tensors created.

The no_alloc flag is critical for backend-managed workflows: when set to true, the context arena stores only tensor metadata (shape, dtype, strides, name, graph linkage), and the actual tensor data buffers are allocated separately by a backend allocator (e.g., for GPU memory). This allows a small CPU-resident metadata arena to manage tensors whose data resides in device memory.

Code Reference

Source Location

GGML repo, file: src/ggml.c, lines L1522-1562.

Signature

struct ggml_context * ggml_init(struct ggml_init_params params);

Import

#include "ggml.h"

Dependencies

ggml.h -- public header declaring ggml_init, ggml_init_params, and ggml_context.

I/O Contract

Inputs

Parameter	Type	Required	Description
`params.mem_size`	`size_t`	Yes	Total size in bytes of the context memory pool. Determines the maximum amount of tensor metadata (and optionally tensor data) that can be allocated within this context.
`params.mem_buffer`	`void *`	No	Optional pointer to an externally managed memory buffer. If `NULL`, `ggml_init` allocates the buffer internally. If non-`NULL`, the provided buffer is used as the arena (caller retains ownership of the underlying memory).
`params.no_alloc`	`bool`	No	When `true`, tensor data storage is not allocated within the context arena. Only tensor metadata structures are placed in the pool. Tensor data must be allocated separately via a backend allocator. When `false`, tensor data is co-located with metadata in the arena.

Outputs

Output	Type	Description
Context pointer	`struct ggml_context *`	An opaque pointer to the initialized memory context. Used as the first argument to all tensor creation functions (`ggml_new_tensor`, `ggml_new_tensor_1d`, etc.). Returns `NULL` on failure (e.g., if internal allocation fails or the maximum number of contexts is exceeded).

Usage Examples

Basic Context for Tensor Metadata and Data

#include "ggml.h"

// Allocate a context with 64 MB for both metadata and tensor data
struct ggml_init_params params = {
    .mem_size   = 64 * 1024 * 1024,  // 64 MB
    .mem_buffer = NULL,               // let ggml allocate internally
    .no_alloc   = false,              // tensor data stored in the arena
};

struct ggml_context * ctx = ggml_init(params);

// Create tensors -- metadata and data are both in the arena
struct ggml_tensor * a = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 768, 768);
struct ggml_tensor * b = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 768, 768);

// ... use tensors for computation ...

// Free the entire context (all tensor metadata and data) in one call
ggml_free(ctx);

Metadata-Only Context with Backend Allocator

#include "ggml.h"

// Small context for metadata only (tensor data managed by backend)
struct ggml_init_params params = {
    .mem_size   = 1 * 1024 * 1024,   // 1 MB for metadata
    .mem_buffer = NULL,               // let ggml allocate internally
    .no_alloc   = true,               // skip tensor data allocation
};

struct ggml_context * ctx = ggml_init(params);

// Create tensor metadata -- no data is allocated in the arena
struct ggml_tensor * weight = ggml_new_tensor_2d(ctx, GGML_TYPE_F16, 4096, 4096);

// Tensor data is allocated separately by the backend:
// ggml_backend_alloc_ctx_tensors(ctx, backend);

// ... perform computation using backend ...

ggml_free(ctx);

Using an External Memory Buffer

#include "ggml.h"
#include <stdlib.h>

// Provide a pre-allocated buffer as the arena
size_t buf_size = 16 * 1024 * 1024;  // 16 MB
void * buf = malloc(buf_size);

struct ggml_init_params params = {
    .mem_size   = buf_size,
    .mem_buffer = buf,     // use external buffer
    .no_alloc   = false,
};

struct ggml_context * ctx = ggml_init(params);

// Create tensors within the externally provided buffer
struct ggml_tensor * t = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1024);

// ... use tensor ...

// Free the context (does not free the external buffer)
ggml_free(ctx);

// Caller is responsible for freeing the external buffer
free(buf);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment