Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Dagster io Dagster ML Model Lifecycle

From Leeroopedia


Property Value
Type Principle
Category Machine_Learning, MLOps
Repository Dagster_io_Dagster
Related Implementation Implementation:Dagster_io_Dagster_ML_Pipeline_Assets

Overview

Pattern for managing the complete machine learning model lifecycle (training, evaluation, deployment, inference) as a DAG of software-defined assets with quality gates.

Description

The ML model lifecycle in Dagster models each phase of ML development as distinct assets connected through dependency relationships. Training produces a model artifact, evaluation measures quality metrics, deployment applies quality gates (accuracy thresholds) before promoting to production, and inference serves predictions from the deployed model. Dagster's Config classes parameterize each phase (hyperparameters, thresholds, batch sizes), while resource abstractions (ModelStoreResource) provide pluggable storage backends (local filesystem, S3).

The core stages of the lifecycle are:

  • Training -- Produces a model artifact from input data and hyperparameters
  • Evaluation -- Measures quality metrics (accuracy, loss, etc.) on held-out data
  • Deployment -- Applies quality gates (accuracy thresholds) before promoting the model to production
  • Inference -- Serves predictions from the deployed production model

Each stage is represented as a Dagster asset, with explicit data dependencies between them forming a directed acyclic graph.

Usage

Use when building production ML pipelines that need:

  • Reproducible training with configurable hyperparameters
  • Automated quality gates before deployment
  • Pluggable model storage (local filesystem, S3, cloud blob stores)
  • Both batch and real-time inference capabilities
  • Experiment tracking through parameterized Config classes

This pattern is appropriate for any supervised learning workflow where model artifacts must pass validation before serving predictions.

Theoretical Basis

The ML lifecycle follows the pipeline pattern with quality gates. Each stage (train -> evaluate -> deploy -> infer) is modeled as an asset with explicit inputs/outputs.

  • The deployment gate implements a threshold-based decision function: deploy only if accuracy >= threshold. This acts as a binary classifier on model quality, preventing regression in production.
  • The abstract ModelStoreResource follows the strategy pattern for pluggable storage backends. Concrete implementations (local filesystem, S3) are interchangeable without modifying pipeline logic.
  • Config classes separate hyperparameters from logic, enabling experiment tracking and reproducibility. Each configuration set defines a unique point in the hyperparameter search space.
  • The DAG structure ensures that evaluation always runs after training, and deployment always runs after evaluation, enforcing the correct ordering of the lifecycle.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment