Implementation:Ggml_org_Ggml_Ggml_opt_fit

Summary

The ggml_opt_fit function is the high-level training API in GGML that executes a complete training run: it constructs an optimizer context, splits the dataset into training and validation portions, and iterates over epochs by delegating to ggml_opt_epoch. It wraps the full training loop -- including dataset shuffling, forward and backward passes on training batches, forward-only evaluation on validation batches, gradient accumulation, and progress reporting -- into a single function call.

Import

#include "ggml-opt.h"

Dependencies

ggml-opt.h -- public header defining the optimization API, including ggml_opt_fit, ggml_opt_epoch, dataset types, loss types, and optimizer types.
ggml.h -- core GGML header providing tensor types, computation graph primitives, and context management.

Function Signature

void ggml_opt_fit(
    ggml_backend_sched_t              backend_sched,
    struct ggml_context             * ctx_compute,
    struct ggml_tensor              * inputs,
    struct ggml_tensor              * outputs,
    ggml_opt_dataset_t                dataset,
    enum ggml_opt_loss_type           loss_type,
    enum ggml_opt_optimizer_type      optimizer,
    ggml_opt_get_optimizer_params     get_opt_pars,
    int64_t                           nepoch,
    int64_t                           nbatch_logical,
    float                             val_split,
    bool                              silent);

Source: src/ggml-opt.cpp:L998-1078

Parameters

Parameter	Type	Description
backend_sched	`ggml_backend_sched_t`	Backend scheduler handle that manages device selection and graph execution across one or more backends.
ctx_compute	`struct ggml_context *`	GGML context used for allocating intermediate computation tensors during forward and backward passes.
inputs	`struct ggml_tensor *`	Input tensor that will receive batches of training data. Its shape must match the dataset's data-point dimensionality.
outputs	`struct ggml_tensor *`	Output tensor representing the model's predictions. Connected to the loss computation graph.
dataset	`ggml_opt_dataset_t`	The dataset object (created via `ggml_opt_dataset_init`) containing all training samples and labels.
loss_type	`enum ggml_opt_loss_type`	The loss function to use for training. Typical value: `GGML_OPT_LOSS_TYPE_CROSS_ENTROPY` for classification tasks.
optimizer	`enum ggml_opt_optimizer_type`	The optimization algorithm. Typical value: `GGML_OPT_OPTIMIZER_TYPE_ADAMW` (Adam with decoupled weight decay).
get_opt_pars	`ggml_opt_get_optimizer_params`	Callback function that returns custom optimizer parameters (learning rate, beta values, etc.). Can be `NULL` to use defaults.
nepoch	`int64_t`	Number of training epochs. Each epoch processes the entire training split once.
nbatch_logical	`int64_t`	Logical batch size for gradient accumulation. If larger than the physical shard size, gradients are accumulated over multiple shards before each parameter update.
val_split	`float`	Fraction of the dataset to reserve for validation (e.g., `0.05` for 5%). Must be in the range [0.0, 1.0). A value of 0 disables validation.
silent	`bool`	When `true`, suppresses progress bars and per-epoch statistics output.

Return Value

This function returns void. The model parameters are updated in-place through the tensors referenced by the computation graph. Training progress (loss, accuracy) is printed to standard output unless silent is true.

Internal Workflow

ggml_opt_fit orchestrates the full training run through the following steps:

Build optimization context -- Allocates an ggml_opt_t context configured with the specified loss type, optimizer, logical batch size, and optimizer parameter callback.
Compute split sizes -- Uses val_split to partition the dataset into training samples and validation samples. Sizes are rounded to shard boundaries.
Epoch loop -- For each epoch from 1 to nepoch:
1. Calls ggml_opt_epoch to execute one full training epoch.
2. ggml_opt_epoch shuffles the dataset shards, iterates over training shards (forward + backward + accumulate + update), then iterates over validation shards (forward only).
3. Collects and reports training loss, training accuracy, validation loss, and validation accuracy.
Cleanup -- Frees the optimization context and any temporary allocations.

Lower-Level API: ggml_opt_epoch

The per-epoch logic is handled by ggml_opt_epoch, which ggml_opt_fit calls internally:

void ggml_opt_epoch(
    ggml_opt_context_t              opt_ctx,
    ggml_opt_dataset_t              dataset,
    struct ggml_tensor            * inputs,
    struct ggml_tensor            * outputs,
    ggml_opt_epoch_callback         callback_train,
    ggml_opt_epoch_callback         callback_eval);

Source: src/ggml-opt.cpp:L880-923

For each epoch, ggml_opt_epoch performs:

Shuffle the dataset shard order.
Training iteration -- For each training shard: load data into input/output tensors, run the forward and backward computation graphs, accumulate gradients, and update parameters when a logical batch boundary is reached.
Validation iteration -- For each validation shard: load data, run forward-only computation, and accumulate loss/accuracy metrics without updating weights.

Usage Example: MNIST Training

The MNIST example provides a higher-level wrapper that demonstrates typical usage of ggml_opt_fit:

void mnist_model_train(
    mnist_model      & model,
    ggml_opt_dataset_t dataset,
    const int          nepoch,
    const float        val_split);

Source: examples/mnist/mnist-common.cpp:L412-415

This function calls ggml_opt_fit with the model's backend scheduler, compute context, input and output tensors, and the provided training parameters. It uses GGML_OPT_LOSS_TYPE_CROSS_ENTROPY as the loss type and GGML_OPT_OPTIMIZER_TYPE_ADAMW as the optimizer:

#include "ggml-opt.h"
#include "mnist-common.h"

// Assuming model and dataset are already initialized:
mnist_model_train(model, dataset, /*nepoch=*/30, /*val_split=*/0.05f);

// Internally this calls:
// ggml_opt_fit(
//     model.backend_sched,
//     model.ctx_compute,
//     model.inputs,
//     model.outputs,
//     dataset,
//     GGML_OPT_LOSS_TYPE_CROSS_ENTROPY,
//     GGML_OPT_OPTIMIZER_TYPE_ADAMW,
//     NULL,       // get_opt_pars: use defaults
//     30,         // nepoch
//     512,        // nbatch_logical
//     0.05f,      // val_split: 5% for validation
//     false);     // silent: show progress

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment