Workflow:Mlflow Mlflow Experiment Tracking

Knowledge Sources	MLflow MLflow Tracking Docs MLflow Quickstart
Domains	ML_Ops, Experiment_Management, Model_Training
Last Updated	2026-02-13 20:00 GMT

Overview

End-to-end process for tracking machine learning experiments by logging parameters, metrics, artifacts, and models across training runs using MLflow's fluent API.

Description

This workflow covers the standard procedure for instrumenting ML training code with MLflow experiment tracking. It enables data scientists and ML engineers to record every aspect of a training run — hyperparameters, evaluation metrics, trained model artifacts, and dataset metadata — into a centralized tracking server. The fluent API provides a high-level, context-manager-based interface that automatically manages run lifecycle. Logged data is viewable through the MLflow UI for comparison and analysis across runs.

Key capabilities:

Parameter logging for hyperparameter tracking
Metric logging with step-based time series support
Artifact storage for models, plots, and data files
Dataset input tracking with provenance
Automatic experiment organization and run grouping

Usage

Execute this workflow when you are training or evaluating a machine learning model and need to systematically record hyperparameters, performance metrics, and model artifacts for comparison, reproducibility, or deployment promotion. This applies to any ML framework — scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, and others.

Execution Steps

Step 1: Configure Tracking Environment

Set up the MLflow tracking URI and experiment name to determine where run data is stored. The tracking URI can point to a local file store, a remote tracking server, or a Databricks workspace. The experiment groups related runs for comparison.

Key considerations:

Set tracking URI via environment variable or API call before starting runs
Create or select an experiment by name to organize related runs
The default local store writes to an mlruns directory

Step 2: Start a Run

Initialize a new MLflow run using the context manager pattern, which automatically handles run lifecycle (start and end). Each run receives a unique ID and is associated with the active experiment. Runs can be nested for complex workflows like hyperparameter tuning.

Key considerations:

Use the context manager (with block) for automatic run termination
Optionally provide a run name for human-readable identification
Nested runs support parent-child relationships for hierarchical experiments

Step 3: Log Parameters

Record the hyperparameters and configuration values used for the training run. Parameters are key-value pairs where values are typically strings, numbers, or booleans. They are logged once per run and represent the configuration that produced the results.

Key considerations:

Log individual parameters or batch-log a dictionary of parameters
Parameters are immutable once logged — they cannot be updated
Common parameters include learning rate, batch size, number of epochs, and model architecture choices

Step 4: Execute Training

Run the actual model training logic. This is the user's existing training code — MLflow does not modify the training process itself. During training, metrics can be logged at each step or epoch to capture the training progression.

Key considerations:

Training code runs unchanged — MLflow only instruments the logging
For iterative training, log metrics at each step with a step counter
System metrics (CPU, GPU, memory) can be optionally captured

Step 5: Log Metrics

Record performance metrics computed during or after training. Metrics support step-based logging for time-series visualization (e.g., loss per epoch). Multiple metrics can be logged simultaneously via batch logging.

Key considerations:

Metrics can be logged multiple times with different step values for time-series tracking
Common metrics include accuracy, loss, precision, recall, F1 score, and RMSE
Batch logging reduces overhead when logging many metrics at once

Step 6: Log Artifacts and Model

Store trained model files, evaluation plots, data samples, and other output files as run artifacts. The model can be logged using a framework-specific flavor (e.g., sklearn, pytorch) which captures the model along with its dependencies and inference signature.

Key considerations:

Use framework-specific log_model functions for proper model packaging
Log input examples and model signatures for schema enforcement during inference
Additional artifacts like confusion matrices, feature importance plots, and sample predictions enhance run documentation

Step 7: Review and Compare

Access the MLflow tracking UI to visualize logged runs, compare metrics across experiments, and identify the best-performing model configurations. The UI supports filtering, sorting, and charting of run data.

Key considerations:

Launch the UI with the mlflow server or mlflow ui command
Compare runs side-by-side using the comparison view
Use search filters to narrow down runs by parameters or metrics

Execution Diagram

GitHub URL

Workflow Repository