Workflow:Dagster io Dagster ML Pipeline

Knowledge Sources	Dagster Dagster Docs ML Pipeline Example
Domains	Machine_Learning, ML_Ops, Model_Training
Last Updated	2026-02-10 12:00 GMT

Overview

End-to-end process for building a production-ready machine learning pipeline using Dagster and PyTorch, covering data preparation, CNN model training, evaluation with quality gates, and deployment with inference services.

Description

This workflow demonstrates how to orchestrate a complete machine learning lifecycle with Dagster. It downloads and preprocesses the MNIST handwritten digit dataset, trains a convolutional neural network (CNN) with configurable hyperparameters, evaluates model performance with per-class metrics and confusion matrices, implements automated quality gates for deployment decisions, and provides batch and real-time prediction services. The pipeline uses Dagster assets for each stage, rich metadata for UI visibility, and configurable deployment strategies.

Usage

Execute this workflow when you need to build a reproducible ML training pipeline with automated quality assurance. This is appropriate for teams that want to standardize model training, enforce quality thresholds before deployment, and maintain full lineage from data preparation through inference. Requires Python 3.10+ and PyTorch.

Execution Steps

Step 1: Data Ingestion and Preprocessing

Download the MNIST dataset using PyTorch utilities, normalize pixel values with precomputed statistics, and perform a stratified train/validation split. Two assets are produced: the raw downloaded dataset and the processed, split dataset. Comprehensive metadata is generated for each asset, providing dataset statistics visible in the Dagster UI.

Key considerations:

Normalization uses precomputed MNIST statistics (mean=0.1307, std=0.3081) for reproducibility
Stratified splitting (80/20) preserves class distribution across training and validation sets
Rich metadata (sample counts, class distributions) is recorded for UI-based monitoring
Centralized constants ensure consistent preprocessing across training and inference

Step 2: Model Training

Train a three-layer CNN (DigitCNN) on the preprocessed MNIST data with configurable hyperparameters. The model architecture includes batch normalization and dropout for regularization. A ModelConfig class exposes learning rate, optimizer choice, early stopping, and other training parameters. The trained model is persisted with a descriptive filename encoding the training configuration.

Key considerations:

The CNN architecture (DigitCNN) uses three convolutional layers with batch normalization and dropout
ModelConfig provides configurable hyperparameters without code changes
Early stopping prevents overfitting by monitoring validation loss
Model filenames encode training configuration for experiment tracking

Step 3: Model Evaluation

Evaluate the trained model on the validation set, computing per-class precision, recall, and F1 scores along with a confusion matrix. The evaluation asset depends on both the model and the processed data assets, ensuring consistent data/model pairing. All metrics are recorded as Dagster metadata for historical comparison.

Key considerations:

Per-class metrics identify categories where the model underperforms
Confusion matrix visualization highlights systematic misclassification patterns
Metrics are stored as asset metadata for cross-run comparison in the Dagster UI
Evaluation uses scikit-learn metrics for standardized computation

Step 4: Deployment with Quality Gates

Determine whether to deploy the trained model based on configurable quality thresholds. Three deployment strategies are supported: quality-based automatic deployment (model must exceed accuracy threshold), manual selection (human decision), and force deployment (bypass quality gates). An abstract storage interface supports both local filesystem and S3 backends.

Key considerations:

Quality gates enforce minimum performance thresholds before production deployment
Multiple deployment strategies accommodate different operational requirements
Storage abstraction enables local development with production S3 deployment
Deployment decisions are logged as metadata for audit trails

Step 5: Inference Services

Provide batch and real-time prediction capabilities using the deployed model. The inference layer loads the most recently deployed model and applies consistent preprocessing (same normalization as training). Predictions include confidence scores and class labels.

Key considerations:

Inference preprocessing must exactly match training preprocessing for valid results
Both batch (file-based) and real-time (single-image) prediction modes are supported
The inference asset depends on the deployed model asset for automatic lineage tracking
Confidence scores accompany predictions for downstream decision making

Execution Diagram

GitHub URL

Workflow Repository