Implementation:Ggml org Ggml Magika inference

**Implementation Metadata**
File Name	`examples/magika/main.cpp`
Repository	ggml-org/ggml
Lines	374
Language	C++
Domain Tags	ML_Inference, File_Classification, Example
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

examples/magika/main.cpp is a C++ implementation of Google Magika file type detection using GGML for inference. It demonstrates a practical real-world GGML inference application that performs file content classification using a pre-trained neural network, showcasing GGUF model loading, graph construction, and batch inference.

Description

The file defines 113 file type labels (from "ai" to "zlibstream") and the following structures:

magika_hparams -- Model hyperparameters: block_size=4096, beg_size=512, mid_size=512, end_size=512, min_file_size_for_dl=16, n_label=113, f_norm_eps=0.001, padding_token=256
magika_model -- Model with dense layers, layer normalization, and target label output layers

The inference pipeline:

Loads model weights from GGUF format via gguf_init_from_file
Reads file content and extracts beginning, middle, and end byte segments
Constructs a compute graph with dense layers, layer normalization, and softmax
Runs inference via ggml_backend_graph_compute
Returns the highest-scoring file type label

Usage

# Build the magika example
cmake -B build
cmake --build build --target magika

# Run file type detection
./build/bin/magika -m magika.gguf input_file.bin

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`examples/magika/main.cpp`	374

Key Signatures

static const char * magika_labels[] = {
    "ai", "apk", "appleplist", "asm", "asp", ...  // 113 labels
};

struct magika_hparams {
    const int block_size = 4096;
    const int beg_size = 512;
    const int mid_size = 512;
    const int end_size = 512;
    const int n_label = 113;
    const float f_norm_eps = 0.001f;
    const int padding_token = 256;
};

struct magika_model {
    struct ggml_tensor * dense_w, * dense_b;
    struct ggml_tensor * layer_norm_gamma, * layer_norm_beta;
    struct ggml_tensor * dense_1_w, * dense_1_b;
    struct ggml_tensor * dense_2_w, * dense_2_b;
    struct ggml_tensor * target_label_w, * target_label_b;
    ggml_backend_t backend = ggml_backend_cpu_init();
};

bool magika_model_load(const std::string & fname, magika_model & model);
struct ggml_tensor * checked_get_tensor(struct ggml_context * ctx, const char * name);

I/O Contract

Inputs

Model file -- GGUF-format Magika model weights
Input file -- Any file to classify (reads first/middle/last 512 bytes)

Outputs

File type label -- One of 113 content type labels (e.g., "pdf", "jpeg", "python")
Confidence score -- Softmax probability for the predicted label

Usage Examples

File type detection:

#include "ggml.h"
#include "gguf.h"

// Load model from GGUF
magika_model model;
magika_model_load("magika.gguf", model);

// Classify a file
// The model processes beginning (512 bytes), middle (512 bytes),
// and end (512 bytes) segments of the input file
// Returns label like "python", "jpeg", "pdf", etc.

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_File_Type_Detection

Related Implementations

Implementation:Ggml_org_Ggml_Ggml_init -- Context initialization
Implementation:Ggml_org_Ggml_Gguf_init_empty -- GGUF format handling
Implementation:Ggml_org_Ggml_Ggml_build_forward_expand -- Graph construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment