Principle:Dagster io Dagster Incremental Processing
| Attribute | Value |
|---|---|
| Title | Incremental Processing |
| Category | Principle |
| Domains | Data_Engineering, dbt, Incremental |
| Repository | Dagster_io_Dagster |
Overview
Strategy for processing only new or changed data within partitioned assets by combining Dagster partition windows with dbt incremental materialization.
Description
Incremental processing avoids reprocessing historical data by scoping each run to a specific time window. In the Dagster-dbt integration, this combines Dagster's partition definitions (which provide the time window) with dbt's is_incremental() macro (which conditionally filters data). Template variables bridge the two systems, passing Dagster partition time windows as dbt variables that control the SQL WHERE clause.
The mechanism works through several coordinated layers:
- Partition definitions: Dagster's
DailyPartitionsDefinition(or other time-based partitions) assigns each materialization a time window withstartandendtimestamps. - Template variable resolution: At execution time, the
DbtProjectComponentresolves Jinja templates incli_args, injectingcontext.partition_time_window.startandcontext.partition_time_window.endinto dbt--vars. - dbt incremental logic: The dbt model uses
is_incremental()to conditionally apply aWHEREclause that filters data to the provided time window. - Post-processing assignment: The
post_processingYAML block assigns partition definitions to specific assets (e.g., by dbt tag).
Usage
Use for large datasets where full table refreshes are too expensive. The combination of Dagster daily partitions and dbt incremental models enables efficient processing of time-series data.
Key considerations:
- dbt models must use
materialized='incremental'with theis_incremental()macro - Dagster partition time windows must align with the data's time granularity
- The
--varsbridge requires consistent variable names between YAML config and dbt SQL - The
@template_vardecorator enables Python-computed values to be injected into YAML configuration
Theoretical Basis
Incremental processing implements the delta processing pattern. Instead of recomputing the full result set, only the delta (new/changed data) is processed and merged into the target. Dagster provides the partitioning primitive (time windows), dbt provides the merge strategy (INSERT/MERGE), and template variables bridge the two.
Design principles at work:
- Separation of concerns: Dagster owns partition scheduling and time window computation. dbt owns the incremental merge strategy. Neither system needs to understand the other's internals.
- Late binding: CLI arguments are resolved at execution time via Jinja templates, allowing the same YAML configuration to produce different dbt invocations for different partitions.
- Idempotency: Each partition window defines a deterministic scope. Re-running a partition replaces exactly the same data, ensuring idempotent results.
The @template_var decorator bridges Python-defined objects (like DailyPartitionsDefinition) into the YAML configuration layer, maintaining the declarative style while supporting objects that cannot be expressed in pure YAML.