Principle:Dagster io Dagster Dbt Project Integration
| Attribute | Value |
|---|---|
| Title | Dbt Project Integration |
| Category | Principle |
| Domains | Data_Engineering, dbt |
| Repository | Dagster_io_Dagster |
Overview
Strategy for integrating dbt (data build tool) transformation projects into Dagster's asset graph through component-based configuration.
Description
dbt project integration maps dbt models to Dagster assets, enabling orchestration of SQL-based transformations alongside Python-based data processing. The DbtProjectComponent bridges dbt's model DAG with Dagster's asset DAG, translating dbt source() and ref() references into asset dependencies, dbt tests into asset checks, and dbt materializations into Dagster materialization events. Configuration is YAML-driven, using Dagster's component system.
The integration follows a layered approach:
- Discovery layer: The component reads the dbt project directory, locates
dbt_project.yml, and parses the dbt manifest to discover all models, sources, seeds, and snapshots. - Translation layer: Each dbt resource is mapped to a Dagster
AssetSpecvia theDagsterDbtTranslator. Asset keys, groups, dependencies, and metadata are derived from dbt resource properties. - Execution layer: At materialization time, the component invokes the dbt CLI (typically
dbt build) and streams dbt events back to Dagster as materialization and check results. - Configuration layer: YAML attributes control project path, CLI arguments, translation overrides, and optional features like row count metadata.
Usage
Use when dbt projects need to be orchestrated as part of a broader data pipeline. The component-based approach is preferred for new projects, providing YAML-driven configuration and automatic asset key translation.
Typical scenarios include:
- Orchestrating dbt transformations downstream of Python-based ingestion assets
- Running dbt tests as Dagster asset checks for unified data quality monitoring
- Partitioning dbt models by time windows to enable incremental processing
- Scaffolding new dbt integrations via
dg scaffold defs dagster_dbt.DbtProjectComponent
Theoretical Basis
The integration applies the adapter pattern, translating between two DAG-based systems (dbt's model graph and Dagster's asset graph). The YAML-based component system follows the convention-over-configuration principle, where standard dbt projects are auto-discovered and mapped without custom Python code. The translation layer (key templates, group mapping) provides escape hatches for non-standard configurations.
The component architecture separates concerns:
- State management: The
StateBackedComponentbase class handles dbt project preparation (parsing, compiling) as a build step separate from runtime execution. - Resolution: Jinja-based template resolution allows dynamic values (partition keys, environment variables) to be injected into otherwise static YAML configuration.
- Subsetting: The
can_subset=Trueflag on the generated multi-asset enables Dagster to materialize individual dbt models without running the full project.