Implementation:Ggml org Ggml Mnist model init random
Summary
mnist_model_init_random initializes an mnist_model struct with randomly generated weights and biases for either a fully connected or convolutional neural network architecture, ready for training from scratch on the MNIST handwritten digit dataset.
API Signature
mnist_model mnist_model_init_random(
const std::string & arch,
const std::string & backend,
const int nbatch_logical,
const int nbatch_physical
);
Source Location
- File:
examples/mnist/mnist-common.cpp, lines 241-310 - Repository: ggml-org/ggml
Parameters
| Parameter | Type | Description |
|---|---|---|
arch |
const std::string & |
Architecture selector. Must be "mnist-fc" (fully connected) or "mnist-cnn" (convolutional).
|
backend |
const std::string & |
Backend device selector. Pass "" for the default backend, or specify explicitly (e.g., "CPU", "CUDA0").
|
nbatch_logical |
int |
Logical batch size used for gradient accumulation. |
nbatch_physical |
int |
Physical batch size — the number of samples processed in a single forward/backward pass. Default is 500. |
Return Value
Returns an mnist_model struct containing all initialized tensors, contexts, and the backend scheduler. The contents vary by architecture:
Fully Connected (mnist-fc)
| Tensor | Shape | Description |
|---|---|---|
fc1_weight |
[784, 500] | First fully connected layer weights |
fc1_bias |
[500] | First fully connected layer bias |
fc2_weight |
[500, 10] | Second fully connected layer weights (output) |
fc2_bias |
[10] | Second fully connected layer bias (output) |
Convolutional (mnist-cnn)
| Tensor | Shape | Description |
|---|---|---|
conv1_kernel |
[3, 3, 1, 8] | First convolution kernel (3x3, 1 input channel, 8 output channels) |
conv1_bias |
[1, 1, 8] | First convolution bias |
conv2_kernel |
[3, 3, 8, 16] | Second convolution kernel (3x3, 8 input channels, 16 output channels) |
conv2_bias |
[1, 1, 16] | Second convolution bias |
dense_weight |
[784, 10] | Dense output layer weights |
dense_bias |
[10] | Dense output layer bias |
Common Fields (Both Architectures)
images— Input tensor for image data.buf_ggml_ctx— Twoggmlcontexts:ctx_static— Holds the model parameters (weights and biases) that persist across batches.ctx_compute— Holds temporary tensors used during forward and backward computation.
- Backend scheduler — Manages dispatching computation to the selected backend device.
Random Initialization Details
Weights are initialized using the C++ <random> standard library:
- RNG:
std::mt19937(Mersenne Twister) - Distribution:
std::normal_distributionwith a small standard deviation
This provides reproducible pseudorandom initialization suitable for training from scratch.
Related Function: mnist_model_init_from_file
For loading pre-trained weights rather than initializing randomly, the companion function mnist_model_init_from_file is defined at lines 131-239 in the same source file. It deserializes model parameters from a GGUF file, enabling inference or fine-tuning without training from scratch.
Dependencies
| Header | Purpose |
|---|---|
ggml.h |
Core tensor operations and context management |
ggml-backend.h |
Backend abstraction (CPU, CUDA, etc.) |
gguf.h |
GGUF file format support (used by mnist_model_init_from_file)
|
<random> |
C++ standard library random number generation |
Language
C++
Related
- Principle:Ggml_org_Ggml_Model_Architecture_Initialization
- Environment:Ggml_org_Ggml_C_Cpp_Build_Environment
- Heuristic:Ggml_org_Ggml_Gradient_Accumulation_Batch_Sizing