Principle:Mlflow Mlflow Training Execution

Knowledge Sources	MLflow Tracking MLflow
Domains	ML_Ops, Experiment_Tracking
Last Updated	2026-02-13 20:00 GMT

Overview

Executing the actual model training logic, which is user-defined code that applies a learning algorithm to data in order to produce a trained model.

Description

Training execution is the core computational step in any machine learning workflow. It is the phase where a learning algorithm processes training data, iteratively adjusting model parameters (weights, coefficients, splits) to minimize a loss function or optimize an objective. This step is entirely framework-dependent and user-defined: the experiment tracking system does not prescribe how training is performed, only that its inputs (parameters) and outputs (metrics, models) are recorded.

The training step sits at the center of the experiment tracking workflow. It consumes the parameters that were logged in the preceding step and produces the metrics and model artifacts that will be logged in subsequent steps. The tracking system wraps this step with an active run context, but the training logic itself operates independently of any tracking API. This separation of concerns is deliberate: practitioners should be free to use any ML framework (scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, or custom implementations) without the tracking layer imposing constraints on their training code.

Training execution may be as simple as a single call to a framework's fit method, or as complex as a multi-epoch loop with custom learning rate schedules, gradient accumulation, mixed-precision training, and distributed data parallelism. Regardless of complexity, the principle remains the same: the training step transforms data and configuration into a trained model and performance observations.

Usage

Execute training code within an active experiment run context, after parameters have been logged. Use whatever ML framework and training procedure is appropriate for the task. The tracking system does not need to be aware of the training internals; it only needs to receive the outputs (metrics and artifacts) once training completes or at intermediate checkpoints. For long-running training jobs, consider logging intermediate metrics at regular intervals (per epoch or per N steps) to enable early stopping decisions and live monitoring through the tracking UI.

Theoretical Basis

Training execution follows the optimization loop paradigm common to nearly all machine learning:

1. Initialization: Model parameters are initialized (randomly, from a pretrained checkpoint, or via a heuristic). The training data is loaded and optionally partitioned into batches.

2. Forward Pass: For each batch (or the full dataset), the model computes predictions from inputs. The loss function compares these predictions against ground truth labels or target values.

3. Backward Pass / Update: The optimization algorithm computes gradients (or other update signals) and adjusts model parameters to reduce the loss. In tree-based methods, this corresponds to selecting splits that maximize information gain. In neural networks, this is backpropagation followed by a gradient descent step.

4. Iteration: Steps 2 and 3 repeat for a configured number of epochs or until a convergence criterion is met. After each epoch or evaluation interval, validation metrics may be computed on held-out data.

5. Termination: Training ends when the stopping condition is satisfied. The resulting model object and final metrics are available for logging.

The tracking system treats this entire process as a black box. Its role is to provide the execution context (the active run) and to accept the outputs (metrics, parameters, artifacts) that the user explicitly logs.

Related Pages

Implemented By

Implementation:Mlflow_Mlflow_User_Training_Code

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment