Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Ggml Mnist model init random

From Leeroopedia


Template:KapsoPageMeta

Summary

mnist_model_init_random initializes an mnist_model struct with randomly generated weights and biases for either a fully connected or convolutional neural network architecture, ready for training from scratch on the MNIST handwritten digit dataset.

API Signature

mnist_model mnist_model_init_random(
    const std::string & arch,
    const std::string & backend,
    const int nbatch_logical,
    const int nbatch_physical
);

Source Location

  • File: examples/mnist/mnist-common.cpp, lines 241-310
  • Repository: ggml-org/ggml

Parameters

Parameter Type Description
arch const std::string & Architecture selector. Must be "mnist-fc" (fully connected) or "mnist-cnn" (convolutional).
backend const std::string & Backend device selector. Pass "" for the default backend, or specify explicitly (e.g., "CPU", "CUDA0").
nbatch_logical int Logical batch size used for gradient accumulation.
nbatch_physical int Physical batch size — the number of samples processed in a single forward/backward pass. Default is 500.

Return Value

Returns an mnist_model struct containing all initialized tensors, contexts, and the backend scheduler. The contents vary by architecture:

Fully Connected (mnist-fc)

Tensor Shape Description
fc1_weight [784, 500] First fully connected layer weights
fc1_bias [500] First fully connected layer bias
fc2_weight [500, 10] Second fully connected layer weights (output)
fc2_bias [10] Second fully connected layer bias (output)

Convolutional (mnist-cnn)

Tensor Shape Description
conv1_kernel [3, 3, 1, 8] First convolution kernel (3x3, 1 input channel, 8 output channels)
conv1_bias [1, 1, 8] First convolution bias
conv2_kernel [3, 3, 8, 16] Second convolution kernel (3x3, 8 input channels, 16 output channels)
conv2_bias [1, 1, 16] Second convolution bias
dense_weight [784, 10] Dense output layer weights
dense_bias [10] Dense output layer bias

Common Fields (Both Architectures)

  • images — Input tensor for image data.
  • buf_ggml_ctx — Two ggml contexts:
    • ctx_static — Holds the model parameters (weights and biases) that persist across batches.
    • ctx_compute — Holds temporary tensors used during forward and backward computation.
  • Backend scheduler — Manages dispatching computation to the selected backend device.

Random Initialization Details

Weights are initialized using the C++ <random> standard library:

  • RNG: std::mt19937 (Mersenne Twister)
  • Distribution: std::normal_distribution with a small standard deviation

This provides reproducible pseudorandom initialization suitable for training from scratch.

Related Function: mnist_model_init_from_file

For loading pre-trained weights rather than initializing randomly, the companion function mnist_model_init_from_file is defined at lines 131-239 in the same source file. It deserializes model parameters from a GGUF file, enabling inference or fine-tuning without training from scratch.

Dependencies

Header Purpose
ggml.h Core tensor operations and context management
ggml-backend.h Backend abstraction (CPU, CUDA, etc.)
gguf.h GGUF file format support (used by mnist_model_init_from_file)
<random> C++ standard library random number generation

Language

C++

Related

Source

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment