Workflow:Ggml org Ggml MNIST Training And Evaluation
| Knowledge Sources | |
|---|---|
| Domains | Training, Inference, Computer_Vision |
| Last Updated | 2026-02-10 08:00 GMT |
Overview
End-to-end process for training and evaluating neural network models (fully connected and convolutional) on the MNIST handwritten digit dataset using GGML's native optimization framework.
Description
This workflow demonstrates the complete machine learning lifecycle using GGML: from data loading through model training to evaluation. It supports two model architectures (fully connected network and convolutional neural network) and two training paths (native GGML training with the ggml-opt API, or external training in PyTorch/TensorFlow with weight export to GGUF format). The GGML-native training path uses automatic differentiation, AdamW optimizer, and the backend scheduler for hardware-accelerated training. Model weights are saved in GGUF format and can be evaluated using the same inference binary regardless of how they were trained.
Key outputs:
- A trained MNIST classifier in GGUF format
- Test set accuracy and loss metrics
- Visual digit predictions for verification
Usage
Execute this workflow when you want to train a simple neural network from scratch using GGML's native training capabilities, or when you want to evaluate a pre-trained model exported from PyTorch or TensorFlow. This workflow serves as the reference example for using GGML's automatic differentiation and optimization APIs (ggml-opt), demonstrating how to build trainable computation graphs with gradient accumulation and batch processing.
Execution Steps
Step 1: Obtain Training Data
Download and prepare the MNIST dataset consisting of 60,000 training images and 10,000 test images of 28x28 pixel handwritten digits (0-9). The dataset is loaded from the standard IDX binary format files. Images are stored as raw unsigned byte arrays and labels as single-byte class indices. The data can be obtained from HuggingFace or downloaded automatically by the Python training scripts.
Key considerations:
- Images are 28x28 grayscale pixels stored as uint8 values
- Labels are integer values 0-9
- Data is loaded into ggml_opt_dataset structures for batch iteration
- Physical and logical batch sizes can differ to enable gradient accumulation
Step 2: Initialize Model Architecture
Define the neural network architecture by creating GGML tensor structures for all trainable weight parameters. For the fully connected network: two linear layers (784 to 500 hidden units, 500 to 10 output classes) with biases. For the convolutional network: two convolutional layers with pooling followed by a dense output layer. Weights are either initialized randomly for training from scratch or loaded from a GGUF file containing pre-trained weights.
Key considerations:
- Model architecture is determined by a string identifier ("mnist-fc" or "mnist-cnn")
- Random initialization uses standard distributions appropriate for each layer type
- When loading from GGUF, the architecture is inferred from metadata in the file
- A separate static context holds the persistent weight tensors
Step 3: Configure Backend and Optimizer
Set up the compute backend infrastructure and optimization parameters. Initialize the backend scheduler with the preferred hardware accelerator and CPU fallback. Configure the AdamW optimizer with appropriate learning rate, batch sizes, and number of epochs. The optimizer manages gradient computation, accumulation across logical batches, and parameter updates.
Key considerations:
- The backend scheduler automatically routes operations to the best available hardware
- Gradient accumulation allows logical batch sizes larger than physical batch sizes
- The ggml-opt API handles the training loop, loss computation, and parameter updates
- A validation split can be specified to monitor overfitting during training
Step 4: Build Computation Graph
Construct the forward computation graph representing the neural network. For the fully connected network: matrix multiplications, bias additions, and ReLU activations. For the CNN: 2D convolutions, max pooling, and dense layers. The graph includes a cross-entropy loss node for training. GGML's automatic differentiation builds the backward graph by traversing the forward graph in reverse order.
Key considerations:
- The computation graph is built once in a dedicated compute context
- Input tensors are marked as graph inputs for data feeding
- The logits tensor is marked as the graph output
- For training, backward pass nodes are automatically generated
Step 5: Train the Model
Execute the training loop using the ggml-opt optimization framework. For each epoch: shuffle the training dataset, iterate over mini-batches, feed image data through the forward graph, compute the cross-entropy loss, backpropagate gradients through the computation graph, accumulate gradients across the logical batch, and apply the AdamW parameter update. Optionally evaluate on a held-out validation set after each epoch.
Key considerations:
- Training can be done natively in GGML or externally in PyTorch/TensorFlow
- The native path uses ggml_opt_epoch for the complete training loop
- External training exports weights to GGUF format via Python scripts
- Both paths produce compatible GGUF model files
Step 6: Evaluate the Model
Run the trained model on the test set to compute accuracy and loss metrics. Load the GGUF model file, reconstruct the forward computation graph, iterate over all test images in batches, compare predicted classes against ground truth labels, and report aggregate test loss and accuracy with confidence intervals. Optionally display a random test image with its predicted digit for visual verification.
Key considerations:
- Evaluation uses the same backend scheduler as training for hardware acceleration
- The evaluation binary works with models from either training path
- Per-image inference time is reported for performance benchmarking
- A WebAssembly build enables browser-based interactive evaluation