Principle:Dagster io Dagster Materialization Metadata
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Observability |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Mechanism for attaching structured metadata to asset materialization events to enable observability, lineage tracking, and data-driven decision making.
Description
Materialization metadata allows assets to report structured information alongside the materialization event. This includes row counts, data previews, quality scores, file paths, processing durations, and any other measurable output. The metadata is displayed in the Dagster UI, can be used in downstream decision logic, and provides an audit trail of what was produced during each run.
The metadata system supports a variety of typed values:
- Numeric values: Integers and floats for counts, scores, and measurements.
- Text values: Strings for labels, paths, and descriptions.
- Markdown: Rich text rendered in the UI for data previews and formatted reports.
- JSON: Structured data for complex metadata.
- URLs: Links to external resources (dashboards, reports, artifacts).
MaterializeResult is the primary vehicle for returning metadata from asset functions. It can also carry inline check results and data version information.
Usage
Use materialization metadata when asset computations produce measurable outputs that should be tracked over time. This is essential for ML pipelines (tracking model accuracy, loss, and hyperparameters), data pipelines (row counts, schema drift, processing time), and any pipeline requiring observability into what was produced at each step.
Theoretical Basis
Metadata-enriched materializations extend the basic event model with structured annotations. This follows the observability pattern where every computation publishes its outcomes in a machine-readable format, enabling downstream systems (UI, alerting, automation) to react to pipeline outcomes.
The theoretical foundation rests on several concepts:
- Event Sourcing: Each materialization is an immutable event in the system log. Attaching metadata to events creates a rich, queryable history of pipeline execution.
- Structured Logging: Unlike free-text logs, structured metadata enables programmatic analysis, aggregation, and visualization.
- Feedback Loops: Metadata enables automation conditions that react to past materialization outcomes (e.g., re-materialize when row count drops below a threshold).
# Pseudocode illustrating the metadata enrichment pattern
result = materialize(asset="orders")
result.metadata = {
"row_count": 15000,
"processing_time_seconds": 12.5,
"schema_version": "v2",
}
# This metadata is:
# 1. Displayed in the UI for human inspection
# 2. Stored in the event log for historical analysis
# 3. Available to automation conditions for decision making