Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Astronomer Astronomer cosmos Airflow Task Dependency Wiring

From Leeroopedia


Knowledge Sources
Domains Orchestration, DAG_Rendering
Last Updated 2026-02-07 00:00 GMT

Overview

An orchestration pattern for establishing execution order between dbt task groups and non-dbt operators within an Airflow DAG.

Description

After Cosmos renders dbt nodes as TaskGroups or individual operators, they must be connected to the rest of the DAG's task graph. Airflow's >> (set_downstream) and << (set_upstream) operators define the execution order between tasks. This is essential for mixed pipelines where:

  • Data extraction precedes dbt transformation
  • dbt model completion triggers downstream notifications or data quality checks
  • Multiple dbt task groups need to be sequenced (e.g., staging before marts)
  • External validation or approval gates separate pipeline phases

The wiring pattern leverages the fact that Airflow's TaskGroup exposes the same dependency interface as individual operators. This means an entire dbt subgraph (potentially containing dozens of model/test tasks) can be wired as a single unit in the dependency chain. Under the hood, Airflow connects the upstream task to all root tasks (tasks with no upstream within the group) and connects all leaf tasks (tasks with no downstream within the group) to the downstream task.

Usage

Use dependency wiring when DbtTaskGroup instances need to be sequenced with other Airflow operators, such as:

  • Data ingestion operators (e.g., S3ToSnowflakeOperator, GCSToBigQueryOperator) that must complete before dbt transforms the data
  • Data quality operators (e.g., GreatExpectationsOperator) that validate dbt output
  • Notification operators (e.g., SlackWebhookOperator, EmailOperator) that alert on pipeline completion
  • Other DbtTaskGroup instances that represent different dbt projects or different subsets of the same project

Theoretical Basis

Airflow's dependency operators are implemented via Python's dunder methods on BaseOperator and TaskGroup:

# BaseOperator.__rshift__ creates a downstream edge
def __rshift__(self, other):
    self.set_downstream(other)
    return other

# BaseOperator.__lshift__ creates an upstream edge
def __lshift__(self, other):
    self.set_upstream(other)
    return other

When applied to a TaskGroup, the dependency is distributed across the group's boundary tasks:

pre_task >> dbt_task_group

# Equivalent to:
for root_task in dbt_task_group.roots:
    pre_task.set_downstream(root_task)
dbt_task_group >> post_task

# Equivalent to:
for leaf_task in dbt_task_group.leaves:
    leaf_task.set_downstream(post_task)

This means:

  • Upstream wiring (pre_task >> dbt_task_group) -- the pre-task must complete before any dbt task in the group starts
  • Downstream wiring (dbt_task_group >> post_task) -- all dbt tasks in the group must complete before the post-task starts
  • Chaining (pre_task >> dbt_task_group >> post_task) -- creates a full pipeline with the dbt subgraph sandwiched between pre and post processing

Multiple task groups can also be chained:

extract_task >> staging_dbt_group >> marts_dbt_group >> notify_task

The key insight is that DbtTaskGroup inherits from Airflow's TaskGroup, so it participates natively in the DAG's dependency graph without any special handling.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment