Workflow:Dagster io Dagster ML Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, ML_Ops, Model_Training |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
End-to-end process for building a production-ready machine learning pipeline using Dagster and PyTorch, covering data preparation, CNN model training, evaluation with quality gates, and deployment with inference services.
Description
This workflow demonstrates how to orchestrate a complete machine learning lifecycle with Dagster. It downloads and preprocesses the MNIST handwritten digit dataset, trains a convolutional neural network (CNN) with configurable hyperparameters, evaluates model performance with per-class metrics and confusion matrices, implements automated quality gates for deployment decisions, and provides batch and real-time prediction services. The pipeline uses Dagster assets for each stage, rich metadata for UI visibility, and configurable deployment strategies.
Usage
Execute this workflow when you need to build a reproducible ML training pipeline with automated quality assurance. This is appropriate for teams that want to standardize model training, enforce quality thresholds before deployment, and maintain full lineage from data preparation through inference. Requires Python 3.10+ and PyTorch.
Execution Steps
Step 1: Data Ingestion and Preprocessing
Download the MNIST dataset using PyTorch utilities, normalize pixel values with precomputed statistics, and perform a stratified train/validation split. Two assets are produced: the raw downloaded dataset and the processed, split dataset. Comprehensive metadata is generated for each asset, providing dataset statistics visible in the Dagster UI.
Key considerations:
- Normalization uses precomputed MNIST statistics (mean=0.1307, std=0.3081) for reproducibility
- Stratified splitting (80/20) preserves class distribution across training and validation sets
- Rich metadata (sample counts, class distributions) is recorded for UI-based monitoring
- Centralized constants ensure consistent preprocessing across training and inference
Step 2: Model Training
Train a three-layer CNN (DigitCNN) on the preprocessed MNIST data with configurable hyperparameters. The model architecture includes batch normalization and dropout for regularization. A ModelConfig class exposes learning rate, optimizer choice, early stopping, and other training parameters. The trained model is persisted with a descriptive filename encoding the training configuration.
Key considerations:
- The CNN architecture (DigitCNN) uses three convolutional layers with batch normalization and dropout
- ModelConfig provides configurable hyperparameters without code changes
- Early stopping prevents overfitting by monitoring validation loss
- Model filenames encode training configuration for experiment tracking
Step 3: Model Evaluation
Evaluate the trained model on the validation set, computing per-class precision, recall, and F1 scores along with a confusion matrix. The evaluation asset depends on both the model and the processed data assets, ensuring consistent data/model pairing. All metrics are recorded as Dagster metadata for historical comparison.
Key considerations:
- Per-class metrics identify categories where the model underperforms
- Confusion matrix visualization highlights systematic misclassification patterns
- Metrics are stored as asset metadata for cross-run comparison in the Dagster UI
- Evaluation uses scikit-learn metrics for standardized computation
Step 4: Deployment with Quality Gates
Determine whether to deploy the trained model based on configurable quality thresholds. Three deployment strategies are supported: quality-based automatic deployment (model must exceed accuracy threshold), manual selection (human decision), and force deployment (bypass quality gates). An abstract storage interface supports both local filesystem and S3 backends.
Key considerations:
- Quality gates enforce minimum performance thresholds before production deployment
- Multiple deployment strategies accommodate different operational requirements
- Storage abstraction enables local development with production S3 deployment
- Deployment decisions are logged as metadata for audit trails
Step 5: Inference Services
Provide batch and real-time prediction capabilities using the deployed model. The inference layer loads the most recently deployed model and applies consistent preprocessing (same normalization as training). Predictions include confidence scores and class labels.
Key considerations:
- Inference preprocessing must exactly match training preprocessing for valid results
- Both batch (file-based) and real-time (single-image) prediction modes are supported
- The inference asset depends on the deployed model asset for automatic lineage tracking
- Confidence scores accompany predictions for downstream decision making