Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Dagster io Dagster Dbt Project Integration

From Leeroopedia


Attribute Value
Title Dbt Project Integration
Category Principle
Domains Data_Engineering, dbt
Repository Dagster_io_Dagster

Overview

Strategy for integrating dbt (data build tool) transformation projects into Dagster's asset graph through component-based configuration.

Description

dbt project integration maps dbt models to Dagster assets, enabling orchestration of SQL-based transformations alongside Python-based data processing. The DbtProjectComponent bridges dbt's model DAG with Dagster's asset DAG, translating dbt source() and ref() references into asset dependencies, dbt tests into asset checks, and dbt materializations into Dagster materialization events. Configuration is YAML-driven, using Dagster's component system.

The integration follows a layered approach:

  • Discovery layer: The component reads the dbt project directory, locates dbt_project.yml, and parses the dbt manifest to discover all models, sources, seeds, and snapshots.
  • Translation layer: Each dbt resource is mapped to a Dagster AssetSpec via the DagsterDbtTranslator. Asset keys, groups, dependencies, and metadata are derived from dbt resource properties.
  • Execution layer: At materialization time, the component invokes the dbt CLI (typically dbt build) and streams dbt events back to Dagster as materialization and check results.
  • Configuration layer: YAML attributes control project path, CLI arguments, translation overrides, and optional features like row count metadata.

Usage

Use when dbt projects need to be orchestrated as part of a broader data pipeline. The component-based approach is preferred for new projects, providing YAML-driven configuration and automatic asset key translation.

Typical scenarios include:

  • Orchestrating dbt transformations downstream of Python-based ingestion assets
  • Running dbt tests as Dagster asset checks for unified data quality monitoring
  • Partitioning dbt models by time windows to enable incremental processing
  • Scaffolding new dbt integrations via dg scaffold defs dagster_dbt.DbtProjectComponent

Theoretical Basis

The integration applies the adapter pattern, translating between two DAG-based systems (dbt's model graph and Dagster's asset graph). The YAML-based component system follows the convention-over-configuration principle, where standard dbt projects are auto-discovered and mapped without custom Python code. The translation layer (key templates, group mapping) provides escape hatches for non-standard configurations.

The component architecture separates concerns:

  • State management: The StateBackedComponent base class handles dbt project preparation (parsing, compiling) as a build step separate from runtime execution.
  • Resolution: Jinja-based template resolution allows dynamic values (partition keys, environment variables) to be injected into otherwise static YAML configuration.
  • Subsetting: The can_subset=True flag on the generated multi-asset enables Dagster to materialize individual dbt models without running the full project.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment