Principle:Dagster io Dagster Data Quality Validation

Knowledge Sources	Dagster Dagster Docs
Domains	Data_Engineering, Data_Quality
Last Updated	2026-02-10 00:00 GMT

Overview

Mechanism for defining and executing data quality assertions as first-class objects attached to specific assets in the data pipeline.

Description

Asset checks are declarative data quality tests that validate the contents of an asset after materialization. Unlike ad-hoc assertions embedded within asset computation code, asset checks are visible in the Dagster UI, can block downstream execution, and produce structured results with metadata.

Key characteristics of asset checks include:

First-Class Status: Checks are registered as pipeline objects alongside assets, not hidden inside asset code.
Severity Levels: Checks support ERROR (hard failure) and WARN (advisory) severity levels.
Blocking Behavior: When blocking=True, a failed check prevents downstream assets from materializing.
Structured Results: Check outcomes include pass/fail status, arbitrary metadata, and severity, all displayed in the UI.
Cross-Asset Validation: Checks can reference additional assets as inputs via additional_ins, enabling validation across multiple data sources.

Usage

Use asset checks when data quality must be validated after asset materialization. They are especially valuable when downstream assets should be blocked on quality failures, when validation results need to be visible and trackable in the Dagster UI, or when quality metrics must be monitored over time. Common checks include null value detection, row count thresholds, schema validation, and referential integrity checks.

Theoretical Basis

Data quality validation follows the test-after-write pattern common in data engineering. By making checks first-class pipeline objects (rather than inline assertions), the system can enforce quality gates, track validation history, and provide observability into data health across the entire pipeline.

The blocking parameter implements a circuit-breaker pattern for data pipelines. When a quality check fails with blocking enabled, the system halts propagation of potentially corrupt data to downstream consumers, preventing cascading data quality issues.

# Pseudocode illustrating the quality gate pattern
asset("raw_data")
check("raw_data has no nulls", asset="raw_data", blocking=True)
asset("clean_data", deps=["raw_data"])
# If the check fails, clean_data will NOT be materialized

This approach provides several theoretical benefits:

Separation of Concerns: Validation logic is decoupled from computation logic, making both easier to maintain.
Auditability: Every check execution produces a recorded result, creating an audit trail of data quality over time.
Composability: Checks can be added, removed, or modified independently of the assets they validate.

Related Pages

Implemented By

Implementation:Dagster_io_Dagster_Asset_Check_Decorator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment