Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Dagster io Dagster Incremental Processing

From Leeroopedia


Attribute Value
Title Incremental Processing
Category Principle
Domains Data_Engineering, dbt, Incremental
Repository Dagster_io_Dagster

Overview

Strategy for processing only new or changed data within partitioned assets by combining Dagster partition windows with dbt incremental materialization.

Description

Incremental processing avoids reprocessing historical data by scoping each run to a specific time window. In the Dagster-dbt integration, this combines Dagster's partition definitions (which provide the time window) with dbt's is_incremental() macro (which conditionally filters data). Template variables bridge the two systems, passing Dagster partition time windows as dbt variables that control the SQL WHERE clause.

The mechanism works through several coordinated layers:

  • Partition definitions: Dagster's DailyPartitionsDefinition (or other time-based partitions) assigns each materialization a time window with start and end timestamps.
  • Template variable resolution: At execution time, the DbtProjectComponent resolves Jinja templates in cli_args, injecting context.partition_time_window.start and context.partition_time_window.end into dbt --vars.
  • dbt incremental logic: The dbt model uses is_incremental() to conditionally apply a WHERE clause that filters data to the provided time window.
  • Post-processing assignment: The post_processing YAML block assigns partition definitions to specific assets (e.g., by dbt tag).

Usage

Use for large datasets where full table refreshes are too expensive. The combination of Dagster daily partitions and dbt incremental models enables efficient processing of time-series data.

Key considerations:

  • dbt models must use materialized='incremental' with the is_incremental() macro
  • Dagster partition time windows must align with the data's time granularity
  • The --vars bridge requires consistent variable names between YAML config and dbt SQL
  • The @template_var decorator enables Python-computed values to be injected into YAML configuration

Theoretical Basis

Incremental processing implements the delta processing pattern. Instead of recomputing the full result set, only the delta (new/changed data) is processed and merged into the target. Dagster provides the partitioning primitive (time windows), dbt provides the merge strategy (INSERT/MERGE), and template variables bridge the two.

Design principles at work:

  • Separation of concerns: Dagster owns partition scheduling and time window computation. dbt owns the incremental merge strategy. Neither system needs to understand the other's internals.
  • Late binding: CLI arguments are resolved at execution time via Jinja templates, allowing the same YAML configuration to produce different dbt invocations for different partitions.
  • Idempotency: Each partition window defines a deterministic scope. Re-running a partition replaces exactly the same data, ensuring idempotent results.

The @template_var decorator bridges Python-defined objects (like DailyPartitionsDefinition) into the YAML configuration layer, maintaining the declarative style while supporting objects that cannot be expressed in pure YAML.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment