Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Ggml org Ggml MNIST Training And Evaluation

From Leeroopedia


Knowledge Sources
Domains Training, Inference, Computer_Vision
Last Updated 2026-02-10 08:00 GMT

Overview

End-to-end process for training and evaluating neural network models (fully connected and convolutional) on the MNIST handwritten digit dataset using GGML's native optimization framework.

Description

This workflow demonstrates the complete machine learning lifecycle using GGML: from data loading through model training to evaluation. It supports two model architectures (fully connected network and convolutional neural network) and two training paths (native GGML training with the ggml-opt API, or external training in PyTorch/TensorFlow with weight export to GGUF format). The GGML-native training path uses automatic differentiation, AdamW optimizer, and the backend scheduler for hardware-accelerated training. Model weights are saved in GGUF format and can be evaluated using the same inference binary regardless of how they were trained.

Key outputs:

  • A trained MNIST classifier in GGUF format
  • Test set accuracy and loss metrics
  • Visual digit predictions for verification

Usage

Execute this workflow when you want to train a simple neural network from scratch using GGML's native training capabilities, or when you want to evaluate a pre-trained model exported from PyTorch or TensorFlow. This workflow serves as the reference example for using GGML's automatic differentiation and optimization APIs (ggml-opt), demonstrating how to build trainable computation graphs with gradient accumulation and batch processing.

Execution Steps

Step 1: Obtain Training Data

Download and prepare the MNIST dataset consisting of 60,000 training images and 10,000 test images of 28x28 pixel handwritten digits (0-9). The dataset is loaded from the standard IDX binary format files. Images are stored as raw unsigned byte arrays and labels as single-byte class indices. The data can be obtained from HuggingFace or downloaded automatically by the Python training scripts.

Key considerations:

  • Images are 28x28 grayscale pixels stored as uint8 values
  • Labels are integer values 0-9
  • Data is loaded into ggml_opt_dataset structures for batch iteration
  • Physical and logical batch sizes can differ to enable gradient accumulation

Step 2: Initialize Model Architecture

Define the neural network architecture by creating GGML tensor structures for all trainable weight parameters. For the fully connected network: two linear layers (784 to 500 hidden units, 500 to 10 output classes) with biases. For the convolutional network: two convolutional layers with pooling followed by a dense output layer. Weights are either initialized randomly for training from scratch or loaded from a GGUF file containing pre-trained weights.

Key considerations:

  • Model architecture is determined by a string identifier ("mnist-fc" or "mnist-cnn")
  • Random initialization uses standard distributions appropriate for each layer type
  • When loading from GGUF, the architecture is inferred from metadata in the file
  • A separate static context holds the persistent weight tensors

Step 3: Configure Backend and Optimizer

Set up the compute backend infrastructure and optimization parameters. Initialize the backend scheduler with the preferred hardware accelerator and CPU fallback. Configure the AdamW optimizer with appropriate learning rate, batch sizes, and number of epochs. The optimizer manages gradient computation, accumulation across logical batches, and parameter updates.

Key considerations:

  • The backend scheduler automatically routes operations to the best available hardware
  • Gradient accumulation allows logical batch sizes larger than physical batch sizes
  • The ggml-opt API handles the training loop, loss computation, and parameter updates
  • A validation split can be specified to monitor overfitting during training

Step 4: Build Computation Graph

Construct the forward computation graph representing the neural network. For the fully connected network: matrix multiplications, bias additions, and ReLU activations. For the CNN: 2D convolutions, max pooling, and dense layers. The graph includes a cross-entropy loss node for training. GGML's automatic differentiation builds the backward graph by traversing the forward graph in reverse order.

Key considerations:

  • The computation graph is built once in a dedicated compute context
  • Input tensors are marked as graph inputs for data feeding
  • The logits tensor is marked as the graph output
  • For training, backward pass nodes are automatically generated

Step 5: Train the Model

Execute the training loop using the ggml-opt optimization framework. For each epoch: shuffle the training dataset, iterate over mini-batches, feed image data through the forward graph, compute the cross-entropy loss, backpropagate gradients through the computation graph, accumulate gradients across the logical batch, and apply the AdamW parameter update. Optionally evaluate on a held-out validation set after each epoch.

Key considerations:

  • Training can be done natively in GGML or externally in PyTorch/TensorFlow
  • The native path uses ggml_opt_epoch for the complete training loop
  • External training exports weights to GGUF format via Python scripts
  • Both paths produce compatible GGUF model files

Step 6: Evaluate the Model

Run the trained model on the test set to compute accuracy and loss metrics. Load the GGUF model file, reconstruct the forward computation graph, iterate over all test images in batches, compare predicted classes against ground truth labels, and report aggregate test loss and accuracy with confidence intervals. Optionally display a random test image with its predicted digit for visual verification.

Key considerations:

  • Evaluation uses the same backend scheduler as training for hardware acceleration
  • The evaluation binary works with models from either training path
  • Per-image inference time is reported for performance benchmarking
  • A WebAssembly build enables browser-based interactive evaluation

Execution Diagram

GitHub URL

Workflow Repository