Principle:Ggml_org_Ggml_Tensor_Creation
Summary
Creating typed, multi-dimensional tensor objects for numerical computation. Tensors serve as the fundamental data abstraction for machine learning workloads in GGML: typed arrays carrying shape, stride, and metadata through every stage of a computation graph.
Theory
Tensors generalize scalars, vectors, and matrices to n-dimensional arrays. Two key layout decisions govern how elements map to contiguous memory:
- Row-major order (C-style) -- the last index varies fastest in memory.
- Column-major order (Fortran-style) -- the first index varies fastest in memory.
GGML uses a column-major convention internally: for a 2-D tensor with dimensions ne0 x ne1, elements along ne0 are contiguous. Strides (nb0, nb1, nb2, nb3) record the byte distance between successive elements along each axis, enabling views and transposes without copying data.
Type System for Quantized Representations
A distinguishing feature of GGML is its type system designed for quantized inference. Beyond the standard IEEE types, GGML defines block-quantized formats that pack weights into compact representations with per-block scale factors:
| Category | Example Types |
|---|---|
| IEEE floating point | GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_BF16
|
| Integer | GGML_TYPE_I8, GGML_TYPE_I16, GGML_TYPE_I32
|
| Block-quantized (4-bit) | GGML_TYPE_Q4_0, GGML_TYPE_Q4_1, GGML_TYPE_Q4_K
|
| Block-quantized (5-bit) | GGML_TYPE_Q5_0, GGML_TYPE_Q5_1, GGML_TYPE_Q5_K
|
| Block-quantized (8-bit) | GGML_TYPE_Q8_0, GGML_TYPE_Q8_1, GGML_TYPE_Q8_K
|
| K-quant mixed | GGML_TYPE_Q2_K, GGML_TYPE_Q3_K, GGML_TYPE_Q6_K
|
GGML supports 30+ element types in total. Each type entry in the internal type-traits table records the block size, the byte size per block, and conversion routines to/from float, so that tensor operations can be dispatched generically regardless of the underlying representation.
Core Concepts
- Shape -- an ordered tuple of up to 4 dimension sizes (
ne[0] .. ne[3]). Unused trailing dimensions default to 1. - Stride -- byte offsets (
nb[0] .. nb[3]) that describe memory layout. Non-contiguous strides enable zero-copy views and transposes. - Element type -- one of the
enum ggml_typevalues. Determines byte width, quantization block size, and available kernels. - Context allocation -- every tensor is allocated from a
ggml_context, a bump-pointer arena that owns the tensor metadata (and optionally the data buffer). - Backend data -- tensor storage can live on CPU, GPU, or other accelerator memory managed by
ggml_backend.