Workflow:Mlflow Mlflow Experiment Tracking
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, Experiment_Management, Model_Training |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
End-to-end process for tracking machine learning experiments by logging parameters, metrics, artifacts, and models across training runs using MLflow's fluent API.
Description
This workflow covers the standard procedure for instrumenting ML training code with MLflow experiment tracking. It enables data scientists and ML engineers to record every aspect of a training run — hyperparameters, evaluation metrics, trained model artifacts, and dataset metadata — into a centralized tracking server. The fluent API provides a high-level, context-manager-based interface that automatically manages run lifecycle. Logged data is viewable through the MLflow UI for comparison and analysis across runs.
Key capabilities:
- Parameter logging for hyperparameter tracking
- Metric logging with step-based time series support
- Artifact storage for models, plots, and data files
- Dataset input tracking with provenance
- Automatic experiment organization and run grouping
Usage
Execute this workflow when you are training or evaluating a machine learning model and need to systematically record hyperparameters, performance metrics, and model artifacts for comparison, reproducibility, or deployment promotion. This applies to any ML framework — scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, and others.
Execution Steps
Step 1: Configure Tracking Environment
Set up the MLflow tracking URI and experiment name to determine where run data is stored. The tracking URI can point to a local file store, a remote tracking server, or a Databricks workspace. The experiment groups related runs for comparison.
Key considerations:
- Set tracking URI via environment variable or API call before starting runs
- Create or select an experiment by name to organize related runs
- The default local store writes to an mlruns directory
Step 2: Start a Run
Initialize a new MLflow run using the context manager pattern, which automatically handles run lifecycle (start and end). Each run receives a unique ID and is associated with the active experiment. Runs can be nested for complex workflows like hyperparameter tuning.
Key considerations:
- Use the context manager (with block) for automatic run termination
- Optionally provide a run name for human-readable identification
- Nested runs support parent-child relationships for hierarchical experiments
Step 3: Log Parameters
Record the hyperparameters and configuration values used for the training run. Parameters are key-value pairs where values are typically strings, numbers, or booleans. They are logged once per run and represent the configuration that produced the results.
Key considerations:
- Log individual parameters or batch-log a dictionary of parameters
- Parameters are immutable once logged — they cannot be updated
- Common parameters include learning rate, batch size, number of epochs, and model architecture choices
Step 4: Execute Training
Run the actual model training logic. This is the user's existing training code — MLflow does not modify the training process itself. During training, metrics can be logged at each step or epoch to capture the training progression.
Key considerations:
- Training code runs unchanged — MLflow only instruments the logging
- For iterative training, log metrics at each step with a step counter
- System metrics (CPU, GPU, memory) can be optionally captured
Step 5: Log Metrics
Record performance metrics computed during or after training. Metrics support step-based logging for time-series visualization (e.g., loss per epoch). Multiple metrics can be logged simultaneously via batch logging.
Key considerations:
- Metrics can be logged multiple times with different step values for time-series tracking
- Common metrics include accuracy, loss, precision, recall, F1 score, and RMSE
- Batch logging reduces overhead when logging many metrics at once
Step 6: Log Artifacts and Model
Store trained model files, evaluation plots, data samples, and other output files as run artifacts. The model can be logged using a framework-specific flavor (e.g., sklearn, pytorch) which captures the model along with its dependencies and inference signature.
Key considerations:
- Use framework-specific log_model functions for proper model packaging
- Log input examples and model signatures for schema enforcement during inference
- Additional artifacts like confusion matrices, feature importance plots, and sample predictions enhance run documentation
Step 7: Review and Compare
Access the MLflow tracking UI to visualize logged runs, compare metrics across experiments, and identify the best-performing model configurations. The UI supports filtering, sorting, and charting of run data.
Key considerations:
- Launch the UI with the mlflow server or mlflow ui command
- Compare runs side-by-side using the comparison view
- Use search filters to narrow down runs by parameters or metrics