Principle:Ggml_org_Ggml_GGUF_File_Creation
Summary
Creating structured binary files for ML model serialization using the GGUF format. GGUF (GGML Universal File) is the successor to the legacy GGML binary format, providing a self-describing binary format with a typed key-value metadata header and aligned tensor data for efficient storage and loading of machine learning models.
Theory
GGUF is a self-describing binary format designed for ML model serialization. The core insight is that a model file should carry all the metadata needed to load and use the model without requiring external configuration files or format-specific parsing logic.
The format achieves this through two primary sections:
- Typed key-value metadata header -- an extensible collection of strongly-typed key-value pairs that describe the model's architecture, tokenizer configuration, training parameters, and any other arbitrary metadata.
- Aligned tensor data -- raw tensor weights laid out with configurable alignment boundaries, enabling efficient memory-mapped access without copying data into separate buffers.
Binary Layout
The GGUF file is structured as a sequential binary stream with the following layout:
| Section | Description |
|---|---|
| Magic number | 4-byte ASCII literal GGUF identifying the file format.
|
| Version | Format version number (currently version 3), enabling forward-compatible evolution of the specification. |
| Tensor count | Total number of tensors stored in the file. |
| KV count | Number of key-value metadata pairs in the header. |
| KV pairs | Sequence of typed key-value entries. Each entry contains a string key, a type tag, and the corresponding value. Supported value types include integers, floats, booleans, strings, and arrays. |
| Tensor info | Per-tensor metadata: name, number of dimensions, dimension sizes, element type, and byte offset into the data section. |
| Padding | Alignment padding to ensure the tensor data section begins at a properly aligned boundary (default 32 bytes). |
| Tensor data | Raw tensor weight data, concatenated sequentially with per-tensor alignment padding as needed. |
Core Concepts
- Self-describing format -- the file header contains all information needed to interpret the tensor data, including tensor names, shapes, and element types. No external schema or configuration is required.
- Extensible metadata -- arbitrary metadata (architecture name, tokenizer configuration, training parameters, quantization details) is stored as typed key-value pairs, allowing new metadata fields without format version changes.
- Standardized tensor naming -- tensors follow a consistent naming convention that encodes the layer index and parameter role, enabling generic model loaders.
- Alignment for memory mapping -- tensor data is aligned to configurable boundaries (default 32 bytes), allowing the file to be memory-mapped directly and tensor pointers to be used without copying.
- Versioned format -- the version field in the header enables backward-compatible format evolution while maintaining the ability to detect and reject unsupported future versions.
Benefits Over Legacy GGML Format
- Extensible metadata -- the legacy format had a fixed header with limited fields; GGUF supports arbitrary key-value pairs of multiple types.
- Standardized tensor naming -- consistent naming conventions replace ad-hoc per-model naming schemes.
- Alignment for memory mapping -- configurable alignment ensures efficient
mmap-based loading on all platforms. - Versioned format -- explicit versioning enables safe format evolution without breaking existing tooling.
- Type safety -- all metadata values carry explicit type tags, preventing misinterpretation of header fields.