Principle:Mlflow Mlflow Artifact and Model Logging
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, Experiment_Tracking |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Persisting output files, serialized models, and associated metadata as versioned artifacts linked to experiment runs.
Description
While parameters and metrics capture the configuration and numeric results of an experiment, artifacts capture everything else: the trained model itself, data files, plots, configuration files, evaluation reports, and any other output that a practitioner needs to preserve. Artifact logging is the mechanism by which these files are uploaded to a durable storage backend and associated with a specific run, creating a complete record of what was produced.
Model logging is a specialized form of artifact logging. A trained model is serialized along with metadata that describes its dependencies, input/output signature, and the framework used to create it. This metadata enables the model to be loaded and served in a framework-agnostic manner. The combination of the serialized model, its dependency specification, and its signature constitutes a self-contained deployment unit that can be registered, versioned, and promoted through a model registry.
The separation between generic artifact logging and model-specific logging reflects a difference in downstream usage. Generic artifacts (plots, CSV files, text reports) are primarily for human consumption and analysis. Logged models are for both human review and automated deployment pipelines that need to load, validate, and serve models without manual intervention.
Usage
Log artifacts whenever a run produces files that are needed for reproducibility, analysis, or deployment. Log individual files when only specific outputs are relevant. Log entire directories when a run produces a structured output (such as a model directory with multiple files). Use framework-specific model logging (e.g., the scikit-learn, PyTorch, or TensorFlow flavor) rather than generic artifact logging for trained models, because model logging adds the metadata needed for inference and deployment.
Theoretical Basis
Artifact and model logging implements a versioned binary store pattern:
Artifact URI Hierarchy: Each run has a root artifact URI. Artifacts are organized within this URI using a path hierarchy, analogous to a filesystem. Specifying an artifact_path places the artifact in a subdirectory of this root, enabling logical grouping (e.g., plots/, data/, models/).
Content Immutability: Once an artifact is logged to a run, it is immutable. The same path within the same run cannot be overwritten. This guarantees that any reference to a run's artifacts will always resolve to the same content, which is essential for reproducibility and audit trails.
Model Flavor Abstraction: When logging a model, the system records one or more "flavors" -- serialization formats that different consumers understand. A scikit-learn model might be logged with both an sklearn flavor and a python_function (pyfunc) flavor. The pyfunc flavor provides a universal interface (predict) that model serving infrastructure can use without knowing the underlying framework.
Signature and Schema: Logged models can include an input/output signature that describes the expected data types and shapes. This signature serves as a contract between the model producer and consumer, enabling validation at inference time and documentation in the model registry.
Registration Bridge: Model logging can optionally trigger registration in a model registry, creating a named, versioned entry that connects the training workflow to the deployment workflow.