Principle:Astronomer Astronomer cosmos Graph Parsing and Task Generation
Overview
A core orchestration principle for parsing a dbt project's dependency graph and generating corresponding orchestration tasks with correct dependency wiring. This two-phase process is the engine behind all Cosmos rendering patterns.
Description
The graph parsing and task generation principle operates in two distinct phases:
Phase 1: Graph Loading
The first phase discovers the dbt project's node structure and dependency relationships. Cosmos supports multiple loading strategies, selected via the LoadMode enum:
- AUTOMATIC -- Cosmos selects the best available method based on the environment.
- DBT_LS -- runs
dbt lsas a subprocess to discover nodes. This is the most accurate method but requires a dbt installation and database connectivity at parse time. - DBT_MANIFEST -- parses a pre-built
manifest.jsonfile. Fast and does not require dbt at parse time, but the manifest must be kept in sync with the project. - DBT_LS_FILE -- reads a previously saved
dbt lsoutput from a file. - DBT_LS_CACHE -- uses a cached
dbt lsresult, refreshing periodically. - CUSTOM -- invokes a user-provided callback to supply nodes.
Each strategy produces the same output: a dictionary of DbtNode objects keyed by unique ID, along with their dependency relationships.
Phase 2: Task Generation
The second phase maps each discovered dbt node to an Airflow operator and wires task dependencies:
- Node filtering -- nodes are filtered based on the
RenderConfig(select, exclude, resource types). - Operator selection -- each node's resource type (model, test, seed, snapshot) determines which operator class is used. The execution mode (local, Docker, Kubernetes, etc.) further refines the operator choice.
- Task instantiation -- operators are created with the appropriate arguments, including any
operator_argsbroadcast settings. - Dependency wiring -- upstream/downstream relationships from the dbt graph are translated into Airflow task dependencies using the
>>operator.
The result is a set of Airflow tasks with dependencies that mirror the dbt project's DAG.
Usage
This two-phase process happens automatically inside DbtDag and DbtTaskGroup. Understanding it is critical for:
- Debugging graph loading issues -- if tasks are missing or incorrectly ordered, the problem usually lies in Phase 1 (wrong load mode, stale manifest, missing dbt connectivity).
- Customizing node-to-operator mapping -- advanced users can influence which operators are chosen by adjusting execution mode or using custom load callbacks.
- Performance tuning -- choosing the right
LoadModeaffects DAG parse time.DBT_MANIFESTis fastest;DBT_LSis most accurate but slowest. - Understanding filtering -- the
RenderConfigcontrols which dbt nodes become Airflow tasks viaselectandexcludeparameters.
Theoretical Basis
dbt projects define a directed acyclic graph (DAG) of nodes:
- Nodes include models, tests, seeds, snapshots, and sources.
- Edges represent data dependencies declared via
ref()andsource()macros.
The graph parsing and task generation principle performs a topological mapping from this dbt graph to an Airflow task graph:
- Each dbt node is mapped to an Airflow operator (preserving node identity).
- Each dbt edge is mapped to an Airflow task dependency (preserving execution order).
- The mapping is structure-preserving -- the Airflow task graph is isomorphic to the (filtered) dbt node graph.
This topological mapping guarantees that:
- No task runs before its dependencies -- the Airflow scheduler enforces the same ordering constraints as dbt.
- Maximum parallelism -- independent branches of the dbt graph execute concurrently.
- Incremental retries -- a failed model can be retried without re-running its upstream dependencies.
The two-phase design separates discovery (what nodes exist) from construction (how to build tasks), enabling different loading strategies without changing the task generation logic.
Related Pages
Implemented By
- Implementation:Astronomer_Astronomer_cosmos_DbtGraph_Load_And_Build
- Implementation:Astronomer_Astronomer_cosmos_LegacyDbtProject_Parser