Principle:Dagster io Dagster ML Model Lifecycle

Property	Value
Type	Principle
Category	Machine_Learning, MLOps
Repository	Dagster_io_Dagster
Related Implementation	Implementation:Dagster_io_Dagster_ML_Pipeline_Assets

Overview

Pattern for managing the complete machine learning model lifecycle (training, evaluation, deployment, inference) as a DAG of software-defined assets with quality gates.

Description

The ML model lifecycle in Dagster models each phase of ML development as distinct assets connected through dependency relationships. Training produces a model artifact, evaluation measures quality metrics, deployment applies quality gates (accuracy thresholds) before promoting to production, and inference serves predictions from the deployed model. Dagster's Config classes parameterize each phase (hyperparameters, thresholds, batch sizes), while resource abstractions (ModelStoreResource) provide pluggable storage backends (local filesystem, S3).

The core stages of the lifecycle are:

Training -- Produces a model artifact from input data and hyperparameters
Evaluation -- Measures quality metrics (accuracy, loss, etc.) on held-out data
Deployment -- Applies quality gates (accuracy thresholds) before promoting the model to production
Inference -- Serves predictions from the deployed production model

Each stage is represented as a Dagster asset, with explicit data dependencies between them forming a directed acyclic graph.

Usage

Use when building production ML pipelines that need:

Reproducible training with configurable hyperparameters
Automated quality gates before deployment
Pluggable model storage (local filesystem, S3, cloud blob stores)
Both batch and real-time inference capabilities
Experiment tracking through parameterized Config classes

This pattern is appropriate for any supervised learning workflow where model artifacts must pass validation before serving predictions.

Theoretical Basis

The ML lifecycle follows the pipeline pattern with quality gates. Each stage (train -> evaluate -> deploy -> infer) is modeled as an asset with explicit inputs/outputs.

The deployment gate implements a threshold-based decision function: deploy only if accuracy >= threshold. This acts as a binary classifier on model quality, preventing regression in production.
The abstract ModelStoreResource follows the strategy pattern for pluggable storage backends. Concrete implementations (local filesystem, S3) are interchangeable without modifying pipeline logic.
Config classes separate hyperparameters from logic, enabling experiment tracking and reproducibility. Each configuration set defines a unique point in the hyperparameter search space.
The DAG structure ensures that evaluation always runs after training, and deployment always runs after evaluation, enforcing the correct ordering of the lifecycle.

Related Pages

Implementation:Dagster_io_Dagster_ML_Pipeline_Assets

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment