Implementation:Ggml org Ggml Mnist model init random

Summary

mnist_model_init_random initializes an mnist_model struct with randomly generated weights and biases for either a fully connected or convolutional neural network architecture, ready for training from scratch on the MNIST handwritten digit dataset.

API Signature

mnist_model mnist_model_init_random(
    const std::string & arch,
    const std::string & backend,
    const int nbatch_logical,
    const int nbatch_physical
);

Source Location

File: examples/mnist/mnist-common.cpp, lines 241-310
Repository: ggml-org/ggml

Parameters

Parameter	Type	Description
`arch`	`const std::string &`	Architecture selector. Must be `"mnist-fc"` (fully connected) or `"mnist-cnn"` (convolutional).
`backend`	`const std::string &`	Backend device selector. Pass `""` for the default backend, or specify explicitly (e.g., `"CPU"`, `"CUDA0"`).
`nbatch_logical`	`int`	Logical batch size used for gradient accumulation.
`nbatch_physical`	`int`	Physical batch size — the number of samples processed in a single forward/backward pass. Default is 500.

Return Value

Returns an mnist_model struct containing all initialized tensors, contexts, and the backend scheduler. The contents vary by architecture:

Fully Connected (`mnist-fc`)

Tensor	Shape	Description
`fc1_weight`	[784, 500]	First fully connected layer weights
`fc1_bias`	[500]	First fully connected layer bias
`fc2_weight`	[500, 10]	Second fully connected layer weights (output)
`fc2_bias`	[10]	Second fully connected layer bias (output)

Convolutional (`mnist-cnn`)

Tensor	Shape	Description
`conv1_kernel`	[3, 3, 1, 8]	First convolution kernel (3x3, 1 input channel, 8 output channels)
`conv1_bias`	[1, 1, 8]	First convolution bias
`conv2_kernel`	[3, 3, 8, 16]	Second convolution kernel (3x3, 8 input channels, 16 output channels)
`conv2_bias`	[1, 1, 16]	Second convolution bias
`dense_weight`	[784, 10]	Dense output layer weights
`dense_bias`	[10]	Dense output layer bias

Common Fields (Both Architectures)

images — Input tensor for image data.
buf_ggml_ctx — Two ggml contexts:
- ctx_static — Holds the model parameters (weights and biases) that persist across batches.
- ctx_compute — Holds temporary tensors used during forward and backward computation.
Backend scheduler — Manages dispatching computation to the selected backend device.

Random Initialization Details

Weights are initialized using the C++ <random> standard library:

RNG: std::mt19937 (Mersenne Twister)
Distribution: std::normal_distribution with a small standard deviation

This provides reproducible pseudorandom initialization suitable for training from scratch.

Related Function: `mnist_model_init_from_file`

For loading pre-trained weights rather than initializing randomly, the companion function mnist_model_init_from_file is defined at lines 131-239 in the same source file. It deserializes model parameters from a GGUF file, enabling inference or fine-tuning without training from scratch.

Dependencies

Header	Purpose
`ggml.h`	Core tensor operations and context management
`ggml-backend.h`	Backend abstraction (CPU, CUDA, etc.)
`gguf.h`	GGUF file format support (used by `mnist_model_init_from_file`)
`<random>`	C++ standard library random number generation

Language

C++

Source

GGML

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment