Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Mlflow Mlflow Experiment Tracking

From Leeroopedia
Knowledge Sources
Domains ML_Ops, Experiment_Management, Model_Training
Last Updated 2026-02-13 20:00 GMT

Overview

End-to-end process for tracking machine learning experiments by logging parameters, metrics, artifacts, and models across training runs using MLflow's fluent API.

Description

This workflow covers the standard procedure for instrumenting ML training code with MLflow experiment tracking. It enables data scientists and ML engineers to record every aspect of a training run — hyperparameters, evaluation metrics, trained model artifacts, and dataset metadata — into a centralized tracking server. The fluent API provides a high-level, context-manager-based interface that automatically manages run lifecycle. Logged data is viewable through the MLflow UI for comparison and analysis across runs.

Key capabilities:

  • Parameter logging for hyperparameter tracking
  • Metric logging with step-based time series support
  • Artifact storage for models, plots, and data files
  • Dataset input tracking with provenance
  • Automatic experiment organization and run grouping

Usage

Execute this workflow when you are training or evaluating a machine learning model and need to systematically record hyperparameters, performance metrics, and model artifacts for comparison, reproducibility, or deployment promotion. This applies to any ML framework — scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, and others.

Execution Steps

Step 1: Configure Tracking Environment

Set up the MLflow tracking URI and experiment name to determine where run data is stored. The tracking URI can point to a local file store, a remote tracking server, or a Databricks workspace. The experiment groups related runs for comparison.

Key considerations:

  • Set tracking URI via environment variable or API call before starting runs
  • Create or select an experiment by name to organize related runs
  • The default local store writes to an mlruns directory

Step 2: Start a Run

Initialize a new MLflow run using the context manager pattern, which automatically handles run lifecycle (start and end). Each run receives a unique ID and is associated with the active experiment. Runs can be nested for complex workflows like hyperparameter tuning.

Key considerations:

  • Use the context manager (with block) for automatic run termination
  • Optionally provide a run name for human-readable identification
  • Nested runs support parent-child relationships for hierarchical experiments

Step 3: Log Parameters

Record the hyperparameters and configuration values used for the training run. Parameters are key-value pairs where values are typically strings, numbers, or booleans. They are logged once per run and represent the configuration that produced the results.

Key considerations:

  • Log individual parameters or batch-log a dictionary of parameters
  • Parameters are immutable once logged — they cannot be updated
  • Common parameters include learning rate, batch size, number of epochs, and model architecture choices

Step 4: Execute Training

Run the actual model training logic. This is the user's existing training code — MLflow does not modify the training process itself. During training, metrics can be logged at each step or epoch to capture the training progression.

Key considerations:

  • Training code runs unchanged — MLflow only instruments the logging
  • For iterative training, log metrics at each step with a step counter
  • System metrics (CPU, GPU, memory) can be optionally captured

Step 5: Log Metrics

Record performance metrics computed during or after training. Metrics support step-based logging for time-series visualization (e.g., loss per epoch). Multiple metrics can be logged simultaneously via batch logging.

Key considerations:

  • Metrics can be logged multiple times with different step values for time-series tracking
  • Common metrics include accuracy, loss, precision, recall, F1 score, and RMSE
  • Batch logging reduces overhead when logging many metrics at once

Step 6: Log Artifacts and Model

Store trained model files, evaluation plots, data samples, and other output files as run artifacts. The model can be logged using a framework-specific flavor (e.g., sklearn, pytorch) which captures the model along with its dependencies and inference signature.

Key considerations:

  • Use framework-specific log_model functions for proper model packaging
  • Log input examples and model signatures for schema enforcement during inference
  • Additional artifacts like confusion matrices, feature importance plots, and sample predictions enhance run documentation

Step 7: Review and Compare

Access the MLflow tracking UI to visualize logged runs, compare metrics across experiments, and identify the best-performing model configurations. The UI supports filtering, sorting, and charting of run data.

Key considerations:

  • Launch the UI with the mlflow server or mlflow ui command
  • Compare runs side-by-side using the comparison view
  • Use search filters to narrow down runs by parameters or metrics

Execution Diagram

GitHub URL

Workflow Repository