Implementation:Ggml_org_Ggml_Ggml_opt_fit
Summary
The ggml_opt_fit function is the high-level training API in GGML that executes a complete training run: it constructs an optimizer context, splits the dataset into training and validation portions, and iterates over epochs by delegating to ggml_opt_epoch. It wraps the full training loop -- including dataset shuffling, forward and backward passes on training batches, forward-only evaluation on validation batches, gradient accumulation, and progress reporting -- into a single function call.
Import
#include "ggml-opt.h"
Dependencies
- ggml-opt.h -- public header defining the optimization API, including
ggml_opt_fit,ggml_opt_epoch, dataset types, loss types, and optimizer types. - ggml.h -- core GGML header providing tensor types, computation graph primitives, and context management.
Function Signature
void ggml_opt_fit(
ggml_backend_sched_t backend_sched,
struct ggml_context * ctx_compute,
struct ggml_tensor * inputs,
struct ggml_tensor * outputs,
ggml_opt_dataset_t dataset,
enum ggml_opt_loss_type loss_type,
enum ggml_opt_optimizer_type optimizer,
ggml_opt_get_optimizer_params get_opt_pars,
int64_t nepoch,
int64_t nbatch_logical,
float val_split,
bool silent);
Source: src/ggml-opt.cpp:L998-1078
Parameters
| Parameter | Type | Description |
|---|---|---|
| backend_sched | ggml_backend_sched_t |
Backend scheduler handle that manages device selection and graph execution across one or more backends. |
| ctx_compute | struct ggml_context * |
GGML context used for allocating intermediate computation tensors during forward and backward passes. |
| inputs | struct ggml_tensor * |
Input tensor that will receive batches of training data. Its shape must match the dataset's data-point dimensionality. |
| outputs | struct ggml_tensor * |
Output tensor representing the model's predictions. Connected to the loss computation graph. |
| dataset | ggml_opt_dataset_t |
The dataset object (created via ggml_opt_dataset_init) containing all training samples and labels.
|
| loss_type | enum ggml_opt_loss_type |
The loss function to use for training. Typical value: GGML_OPT_LOSS_TYPE_CROSS_ENTROPY for classification tasks.
|
| optimizer | enum ggml_opt_optimizer_type |
The optimization algorithm. Typical value: GGML_OPT_OPTIMIZER_TYPE_ADAMW (Adam with decoupled weight decay).
|
| get_opt_pars | ggml_opt_get_optimizer_params |
Callback function that returns custom optimizer parameters (learning rate, beta values, etc.). Can be NULL to use defaults.
|
| nepoch | int64_t |
Number of training epochs. Each epoch processes the entire training split once. |
| nbatch_logical | int64_t |
Logical batch size for gradient accumulation. If larger than the physical shard size, gradients are accumulated over multiple shards before each parameter update. |
| val_split | float |
Fraction of the dataset to reserve for validation (e.g., 0.05 for 5%). Must be in the range [0.0, 1.0). A value of 0 disables validation.
|
| silent | bool |
When true, suppresses progress bars and per-epoch statistics output.
|
Return Value
This function returns void. The model parameters are updated in-place through the tensors referenced by the computation graph. Training progress (loss, accuracy) is printed to standard output unless silent is true.
Internal Workflow
ggml_opt_fit orchestrates the full training run through the following steps:
- Build optimization context -- Allocates an
ggml_opt_tcontext configured with the specified loss type, optimizer, logical batch size, and optimizer parameter callback. - Compute split sizes -- Uses
val_splitto partition the dataset into training samples and validation samples. Sizes are rounded to shard boundaries. - Epoch loop -- For each epoch from 1 to
nepoch:- Calls
ggml_opt_epochto execute one full training epoch. ggml_opt_epochshuffles the dataset shards, iterates over training shards (forward + backward + accumulate + update), then iterates over validation shards (forward only).- Collects and reports training loss, training accuracy, validation loss, and validation accuracy.
- Calls
- Cleanup -- Frees the optimization context and any temporary allocations.
Lower-Level API: ggml_opt_epoch
The per-epoch logic is handled by ggml_opt_epoch, which ggml_opt_fit calls internally:
void ggml_opt_epoch(
ggml_opt_context_t opt_ctx,
ggml_opt_dataset_t dataset,
struct ggml_tensor * inputs,
struct ggml_tensor * outputs,
ggml_opt_epoch_callback callback_train,
ggml_opt_epoch_callback callback_eval);
Source: src/ggml-opt.cpp:L880-923
For each epoch, ggml_opt_epoch performs:
- Shuffle the dataset shard order.
- Training iteration -- For each training shard: load data into input/output tensors, run the forward and backward computation graphs, accumulate gradients, and update parameters when a logical batch boundary is reached.
- Validation iteration -- For each validation shard: load data, run forward-only computation, and accumulate loss/accuracy metrics without updating weights.
Usage Example: MNIST Training
The MNIST example provides a higher-level wrapper that demonstrates typical usage of ggml_opt_fit:
void mnist_model_train(
mnist_model & model,
ggml_opt_dataset_t dataset,
const int nepoch,
const float val_split);
Source: examples/mnist/mnist-common.cpp:L412-415
This function calls ggml_opt_fit with the model's backend scheduler, compute context, input and output tensors, and the provided training parameters. It uses GGML_OPT_LOSS_TYPE_CROSS_ENTROPY as the loss type and GGML_OPT_OPTIMIZER_TYPE_ADAMW as the optimizer:
#include "ggml-opt.h" #include "mnist-common.h" // Assuming model and dataset are already initialized: mnist_model_train(model, dataset, /*nepoch=*/30, /*val_split=*/0.05f); // Internally this calls: // ggml_opt_fit( // model.backend_sched, // model.ctx_compute, // model.inputs, // model.outputs, // dataset, // GGML_OPT_LOSS_TYPE_CROSS_ENTROPY, // GGML_OPT_OPTIMIZER_TYPE_ADAMW, // NULL, // get_opt_pars: use defaults // 30, // nepoch // 512, // nbatch_logical // 0.05f, // val_split: 5% for validation // false); // silent: show progress