Principle:Dagster io Dagster Data Quality Validation
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Quality |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Mechanism for defining and executing data quality assertions as first-class objects attached to specific assets in the data pipeline.
Description
Asset checks are declarative data quality tests that validate the contents of an asset after materialization. Unlike ad-hoc assertions embedded within asset computation code, asset checks are visible in the Dagster UI, can block downstream execution, and produce structured results with metadata.
Key characteristics of asset checks include:
- First-Class Status: Checks are registered as pipeline objects alongside assets, not hidden inside asset code.
- Severity Levels: Checks support
ERROR(hard failure) andWARN(advisory) severity levels. - Blocking Behavior: When
blocking=True, a failed check prevents downstream assets from materializing. - Structured Results: Check outcomes include pass/fail status, arbitrary metadata, and severity, all displayed in the UI.
- Cross-Asset Validation: Checks can reference additional assets as inputs via
additional_ins, enabling validation across multiple data sources.
Usage
Use asset checks when data quality must be validated after asset materialization. They are especially valuable when downstream assets should be blocked on quality failures, when validation results need to be visible and trackable in the Dagster UI, or when quality metrics must be monitored over time. Common checks include null value detection, row count thresholds, schema validation, and referential integrity checks.
Theoretical Basis
Data quality validation follows the test-after-write pattern common in data engineering. By making checks first-class pipeline objects (rather than inline assertions), the system can enforce quality gates, track validation history, and provide observability into data health across the entire pipeline.
The blocking parameter implements a circuit-breaker pattern for data pipelines. When a quality check fails with blocking enabled, the system halts propagation of potentially corrupt data to downstream consumers, preventing cascading data quality issues.
# Pseudocode illustrating the quality gate pattern
asset("raw_data")
check("raw_data has no nulls", asset="raw_data", blocking=True)
asset("clean_data", deps=["raw_data"])
# If the check fails, clean_data will NOT be materialized
This approach provides several theoretical benefits:
- Separation of Concerns: Validation logic is decoupled from computation logic, making both easier to maintain.
- Auditability: Every check execution produces a recorded result, creating an audit trail of data quality over time.
- Composability: Checks can be added, removed, or modified independently of the assets they validate.